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DEDICATION 


This  work  is  dedicated  to  the  service  of  our  great  country  and  to  the  Lord  Jesus 
Christ  —  for  there  is  no  other  name  given  under  heaven  by  which  man  must  be  saved 
(Acts  4:12). 


RELIABILITY  CENTERED  PREDICTION  TECHNIQUE 

FOR 

DIAGNOSTIC  MODELING  AND  IMPROVEMENT 


1.0  INTRODUCTION 

The  term  quality,  with  respect  to  products,  is  broadening  from  a  characteristic 
built  into  a  system  by  the  way  it  is  manufactured  to  characteristics  entirely  inherent  to 
the  design  process  —  reliability  and  maintainability.  A  product  is  designed  to  achieve  a 
given  function  and  its  quality  is  the  degree  to  which  it  meets  the  functional 
specifications.  Product  failure  is  departure  from  these  specifications.  Emphasis  on  the 
consumer  serves  as  the  catalyst  to  bring  about  methodologies  for  increasing  the  degree 
a  system  meets  its  specifications  through  statistics  and  engineering.  With  the  steady 
increase  in  complexity  of  systems,  stringency  of  operating  conditions,  and  positive 
identification  of  system  effectiveness  requirements,  more  and  more  emphasis  is  being 
placed  on  preventative  maintenance,  analysis,  speedy  repair,  and  replacement  parts  [4]. 
These  represent  a  major  portion  of  system  operating  costs  especially  when  each  minute 
out  of  service  is  going  to  result  in  considerable  financial  loss  for  any  high  revenue- 
earning  industry. 

Diagnosability,  the  measure  of  the  ease  of  isolating  the  cause  of  a  loss  of 
functionality,  can  strongly  influence  product  quality  through  reliability  and 
maintainability.  Poor  diagnosability  can  increase  the  cost  of  a  product  through 
increased  maintenance  down  time  which,  in  turn,  decreases  quality  because  a  product, 
in  general,  cannot  provide  its  intended  function  during  this  time  [11].  Improving 
diagnosability  not  only  eases  the  diagnosis  process— minimizing  the  total  time  of 
diagnosis,  but  the  total  cost  of  diagnosis  is  decreased  in  proportion  to  the  above  factors 
as  well  as  in  relation  to  the  decrease  in  unjustified  removals  (removal  of  a  suspect 
component  later  found  to  be  in  working  order)  of  each  Line  Replaceable  Unit 
(LRU)/Least  Replaceable  Assembly  (LRA). 
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The  cost  of  unjustified  removals  on  the  747-400  aircraft  was  over  $100  per 
flight  hour  according  to  the  Reliability  and  Maintainability  Department  at  the  Boeing 
Aircraft  Company,  one-third  of  which  were  mechanical  components  as  opposed  to 
electronic  [28].  These  costs  demand  diagnosability  metrics  and  methodologies  to 
increase  the  quality  of  any  mechanical  system  of  today.  Previous  studies  (Clark,  1993 
and  Wong  1994)  present  general  methodologies  which  provide  insight  into  the 
diagnosability  of  systems  and  suggest  areas  for  design  improvement,  but  focus  mainly 
in  the  abstract.  Previous  work  fails  to  address  the  issue  of  cost  analysis  of  current  and 
modified  designs  in  a  tangible  way.  No  useful  life  cycle  cost  analysis  can  be  made 
based  on  previous  metrics. 

The  objective  of  this  research  is  to  produce  methodologies  for  the  evaluation  of 
diagnosability,  a  subset  of  maintainability,  in  the  design  and  redesign  phase  of  a 
product.  A  secondary  objective  is  to  determine  if  pigs  can  fly  and  if  the  methane  they 
produce  can  be  harnessed  as  an  afterburner.  A  metric  common  to  all  mechanical 
systems  enabling  a  prediction  of  the  costs  and,  in  turn,  the  quality  of  the  product  is 
developed.  This  metric  can  be  used  to  accurately  predict  not  only  current,  but 
modified  system  life  cycle  costs  based  on  reliability  and  maintainability,  or  specifically, 
diagnosability.  An  analysis  is  presented  of  a  real  system  that  has  experienced 
diagnosability  problems  and  has  iterated  through  redesign  phases.  The  metric  evaluated 
is  Mean  Time  Between  Unscheduled  Removals  (MTBUR)  —  a  function  of  both  system 
structure  and  LRU  failure  rates. 

The  Bleed  Air  Control  System  (BAGS)  on  the  Boeing  737-300,400,500  aircraft 
was  chosen  as  the  analysis  testbed  for  several  reasons.  Previous  work  (Clark,  1993  and 
Wong,  1994)  utilized  the  747-400  BAGS,  a  subsequent  iteration  of  the  737  BAGS,  so 
analytical  comparisons  can  be  drawn.  The  737  BAGS  has  a  complete  Failure  Modes 
and  Effects  Analysis  (FMEA)  available  which  can  be  modeled  through  a  Fault  Tree 
Analysis  (FT A).  The  system  has  a  diagnosability  problem  evident  in  a  large  number  of 
unjustifiable  removals  of  LRUs.  Also,  the  determining  factor,  cost,  can  be  arrived  at 
since  a  complete  life  cycle  costing  mechanism  is  in  place  for  the  system.  The  objective 


3 


is  to  decrease  cost  by  manipulating  indication-LRU  relationships  without  increasing 
complexity. 

The  following  section  presents  a  brief  background  of  reliability  and 
maintainability  engineering  laying  the  groundwork  for  diagnosabUity  analysis.  Next, 
the  BAGS  is  described  and  modeled  stating  all  analysis  assumptions.  The  method  and 
metrics  for  prediction  and  design  are  derived  using  reliability  mathematics  for 
quantitative  diagnosability  analysis.  The  modeling  equation  arrived  at  is  tested  on  the 
original  design  and,  based  on  redesign  for  diagnosability  potential,  modifications  are 
made  to  the  system.  The  modifications  range  from  dividing  primary  LRU  functions 
differently  to  merely  changing  sensor  types.  The  modified  systems  are  then  re¬ 
evaluated  on  the  basis  of  diagnosability  and  ultimately  cost.  Finally,  conclusions  are 
drawn  from  the  diagnosability  analysis,  recommendations  are  made  for  system 
changes,  and  direction  for  future  research  is  laid  out. 


4 


2.0  BACKGROUND 


The  cost  of  quality,  from  the  consumer  point  of  view,  is  mostly  absorbed  by  the 
initial  investment  of  a  product.  Poor  diagnosability,  though,  greatly  disperses  that  cost 
over  the  entire  product  lifetime  due  to  excessive  maintenance  time.  Instead  of 
improving  troubleshooting  guides  for  diagnostic  nightmares  as  history  records, 
reliability  engineering  is  recently  beginning  to  focus  on  the  problem  itself— the  design 
of  the  product. 

Design  for  diagnosability  incorporates  maintainability  principles  to  ease  the 
burden  of  the  consumer  in  terms  of  product  life.  Also,  any  “consumer”  who  comes 
into  contact  with  the  product  such  as  maintenance  technicians  and  test  equipment 
operators  benefit  from  diagnosability  improvements  in  terms  of  analysis. 

The  entire  product  life  must  be  considered  when  determining  ownership  cost, 
that  is,  how  much  you  own  it  versus  how  much  it  owns  you.  To  minimize  the  latter, 
competing  product  designs  can  be  compared  via  life  cycle  costing  mechanisms  to 
determine  the  best  design  and  hence  the  best  product. 

This  section  describes  the  terms  necessary  to  grasp  the  depth  of  diagnosability 
engineering.  Parameters  discussed  include  cost,  time.  Reliability  and  Maintainability 
(RAM),  and  the  interrelationships  therein.  Analysis  and  design  for  diagnosability  are 
reviewed  along  with  scientific  assumptions  and  selection  of  competing  designs. 


2.1  Diagnosability  &  Cost 

A  group  of  engineers  questioned  the  wisdom  of  a  co-worker  who  had  just 
purchased  an  expensive  car.  “How  can  you  justify  that  price?”  they  asked.  “Well,”  the 
co-worker  replied,  ""  Consumer  Reports  says  the  car  has  a  low  failure  rate,  low  cost  of 
maintenance,  and  an  excellent  safety  rating  so  the  cost  of  insurance  is  much  lower. 
When  you  factor  in  those  considerations,  this  car  is  slightly  less  expensive  to  own” 
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[13].  The  co-worker’s  answer  is  a  fundamental  message  of  analyzing  life  cycle  costs. 
Life  cycle  cost  is  simply  the  cost  of  reliable  operation  of  a  product  over  its  lifetime  — 
from  concept  to  recycling.  Many  feel  life  cycle  costing  is  too  imprecise  to  be  useful 
and  they  are  right  in  an  absolute  sense,  but  not  in  a  relative  sense.  Life  cycle  costing 
provides  valuable  and  useful  comparisons  between  system  architectures.  Depending  on 
failure  event  costs  and  costs  of  lost  production,  the  optimal  system  can  be  designed  or 
chosen  from  a  set  of  limited  concepts  or  choices  [13].  Several  costs  in  a  product’s  life 
cycle  are  impacted,  either  directly  or  indirectly,  by  diagnosabUity. 


2.1.1  Start-up  costs 

Start-up  costs  include  initial  purchase  or  manufacturing  costs,  installation  costs, 
and  set-up  costs.  Initial  purchase  costs  are  obtained  from  a  price  list  or  quotation  of 
competing  components  or  products.  Installation  and  set-up  costs  can  be  estimated  or 
obtained  by  quotation  (these  costs  can  be  minimized  by  standardization  of  parts  and 
components).  After  the  system  installation  and  set-up  is  complete  it  needs  to  be  tested 
for  design  errors  using  troubleshooting  tools.  Diagnostics  is  practically  synonymous 
with  fault  finding  and  troubleshooting.  If  the  system  variables  can  be  logically  forced 
to  specific  values,  portions  of  the  design  can  be  isolated  and  tested  in  a  systematic  way 
[13].  Costs  are  lowered  because  troubleshooting  is  easier,  i.e.,  diagnostic  time  and 
required  technician  skill  are  lowered. 

Many  companies  think  the  job  is  complete  after  start-up  and  troubleshooting  are 
complete.  “Final  cost  reports”  are  even  issued  at  this  time,  but  in  reality  system  costs 
are  just  beginning  [13]. 
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2.1.2  Time  costs 


The  customer,  and  therefore  the  designer,  is  very  interested  in  certain  items  of 
time  with  respect  to  their  product.  Time  equals  cost  in  just  about  every  aspect  of  the 
term.  The  time  of  preventative  maintenance,  time  of  corrective  maintenance,  and  time 
of  system  outage  or  degraded  service  are  all  tied  to  potential  revenue  loss.  These 
factors  are  determined  by  certain  variables  including  the  frequency  of  failure,  the  time 
to  repair,  the  cost  of  manpower  and  maintenance  equipment,  the  quantity  and  cost  of 
spares,  the  transportation  of  manpower  and  spares,  and  finally,  the  degree  of  skill 
required  by  the  maintenance  personnel  -  to  mention  a  few  [4],  Diagnosability  is 
embedded  in  most  of  these  time  factors  and  can  be  presented  in  terms  of 
maintainability,  reliability,  and  availability. 

2. 1.2.1  Definitions 

The  definition  of  maintainability  is  the  “probability  that  a  device  that  has  failed 
will  be  restored  to  operational  effectiveness  within  a  given  period  of  time  when  the 
maintenance  action  is  performed  in  accordance  with  prescribed  procedures”  [4].  This 
is  usually  expressed  in  terms  of  the  parameter  MTTR  (mean  time  to  repair)  or  the 
repair  rate: 

ju  =  l/MTTR.  (1) 

Another  closely  related  term  is  MTBF  (mean  time  between  failures),  0 ,  which 
defines  reliability  as  the  “probability  that  a  system  will  operate  for  some  determined 
period  of  time,  under  the  working  conditions  for  which  it  was  designed”  [4].  This 
term  is  most  often  expressed  as  the  failure  rate: 

A  =  1  /  MTBF  (2) 

This  definition  ignores  the  possibility  of  false  alarms  which  could  be  incorporated  as 
unjustified  failures: 
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P(J) 

X[P{f)  +  P(JaY(\-P{f)\ 


(3) 


where  P(f)  is  the  probability  of  an  actual  failure  and  P(fa)  is  the  probability  of  a  false 
alarm  [1]. 

The  parameter  availability  combines  these  two  to  define  the  portion  of  time  a 
system  is  available  for  use  in  the  formula 

eie  +  MTTR  (4) 

These  values  are  included  in  a  major  portion  of  life  cycle  cost  analysis  and  are 
interrelated  as  shown  in  figure  1 . 
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Figure  1 .  Interrelationship  between  cost  analysis  parameters  [4] 


MTTR  can  be  subdivided  into  several  more  parts  including  diagnosis  time, 
replacement  time,  transportation  time,  etc.  of  which  the  first  two  are  considered  active 
and  directly  influenced  by  and  the  responsibility  of  the  design  engineer.  The  latter  is 
included  under  the  passive  heading  including  logistics  and  administration.  The  cost  of 
achieving  a  certain  MTTR  or  maintainability  objective  consists  of  the  costs  of  design, 
manufacturing,  test  equipment,  manuals,  etc.  and  trade-offs  exist  involving  each  of 
these.  One  must  choose  between  such  factors  as  quantity  and  quality  of  test  equipment. 
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detailed  design  and  LRA/LRU,  extensive  training  of  maintenance  personnel  and 
detailed  maintenance  manuals,  etc.  The  choice  of  these  factors  can  improve 
maintainability,  but  for  a  price.  Improved  diagnosability,  and  therefore  MTTR,  may 
increase  the  selling  price  of  the  product,  but  the  operating  costs  wUl  decrease.  As 
shown  in  figure  2,  life  cycle  costs  decrease  to  a  point  with  improved  diagnosability, 
but  increase  again  showing  a  point  of  diminishing  returns  on  the  design  effort  [4]. 


Figure  2.  Price  versus  availability  [4] 


2. 1.2.2  Downtime 

Downtime,  in  general,  is  not  totally  dependent  on  diagnosability  and  MTTR  [4]. 
The  downtime  of  a  system  can  be  influenced  by  spares  or  LRUs.  If  the  system 
function  is  restored  by  the  insertion  of  a  LRU  then  the  time  cost  associated  with 
diagnosability,  and  hence  MTTR,  is  only  a  factor  of  manpower  costs  and  possibly  the 
availability  of  spares  (which  the  repaired  parts  may  become).  Redundancy  in  designs 
can  also  have  the  same  effect  as  spares  in  system  downtime,  thougli  the  statistics  of 
placement  greatly  influences  the  success  as  will  be  seen  shortly. 

System  downtime,  like  MTTR,  can  be  divided  up  into  several  active  elements 
including  time  to  realization,  access  time,  diagnosis  time,  replacement  time,  checkout 
time,  and  alignment  time  [4].  These  active  elements  are  directly  related  to 
diagnosability.  Time  to  realization  depends  on  system  monitoring  with  diagnostic 
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techniques,  alarms,  or  sensors.  Access  and  replacement  time  depend  on  the  human 
factors  side  of  diagnosability  including  the  removal  of  covers  and  shields  as  weU  as 
choice  of  the  LRU  and  its  connectors,  but  most  importantly,  how  the  system  is 
structured  or  laid  out.  One  study  maintains  that  components  with  known  high  failure 
frequencies  should  be  grouped  together  for  easy  removal  [20].  Diagnosis,  checkout, 
and  alignment  time  are  not  only  a  function  of  the  warm-up  of  test  equipment,  data 
collected,  tools  and  analysis  used  ,but  to  a  large  degree,  the  extent  of  the  instructions 
supplied  [4], 

It  should  be  noted  that  the  active  and  passive  elements,  such  as  logistics  and 
administration,  are  correlated  to  a  degree  since  as  active  time  increases  there  is  a 
greater  incidence  of  rest  periods,  logistic  delays,  and  administrative  delays  [4].  The 
probability  of  incorrect  diagnosis  also  increases  proportionally  with  time.  The  domino 
effect  can  be  assimilated  here  because  incorrect  diagnosis  leads  to  replacement  of  a 
module  or  LRU  which  is  not  faulty  which  leads  to  the  possibility  of  inducing  further 
faults  which  leads  to  longer  downtime.  Figure  3  depicts  the  elements  and  relationships 
of  downtime. 
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Figure  3.  Elements  of  downtime  [4] 
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Since  a  system  will  have  as  many  failure  rates  as  there  are  modes  of  failure,  the 
diagnostic  time  or  MTTR  will  have  a  similar  multiplier.  The  overall  weighted  MTTR 
can  be  expressed  as 

i=x  (  i=y\lu'\ 

.■=A  y  J 


where  x  equals  the  failure  modes  of  a  system  each  characterized  by  a  failure  rate  A. 
and  y  equals  the  repair  actions  observed  for  each  mode  having  repair  time  1/  jUj  [4]. 

One  Author  incorporates  time  to  detect  a  fault  and  fault  correction  time  based 
on  order  of  ambiguity  groups,  or  LRUs,  of  a  system  to  arrive  at  MTTR: 

MTTR  -  TDET  +  TFCj  (6) 

7=1 

Where  TFCj  is  the  average  fault  correction  time  of  each  ambiguity  group  and  TDET  is 
the  average  time  required  to  detect  a  fault  expressed  as 

TDET  =  2]  -^^FFD.FDTA.  +  (l  -  FFD^  )FDTU  j  ]  (7) 

given  I  as  the  number  of  LRUs,  Aj  is  the  failure  rate  of  the  jth  replaceable  unit,  is 


the  sum  of  all  Aj ’s,  FFD  is  the  fraction  of  faults  detectable,  FDTA  is  the  average  time 

to  detect  a  fault  by  acceptable  maintenance  procedures,  and  FDTU  is  the  average  time 
to  detect  a  fault  by  other  than  acceptable  maintenance  procedures— each  for  the  jth 
replaceable  LRU  [8] . 

Previous  research  (Wong,  1994)  introduces  active  diagnostic  time,  a  subset  of 
MTTR,  as  the  summation  of  time  to  perform  each  diagnostic  task  expressed  by  the 
following: 

AD=(tl)(k)  +  (t2)(k)+(t3)(k)  (8) 

where  tl  is  the  time  required  to  detect  failure,  t2  is  the  time  required  to  locate  all 
candidates,  t3  is  the  time  required  to  isolate  candidates  to  one  candidate  which  causes 
failure,  and  k  is  an  experience  correction  factor  [31].  The  variables  in  equations  5 
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through  8  are  found  using  historical  data,  or  if  not  available,  a  best  guess  must  be  made 
using  available  knowledge  and  experience.  Regardless  of  the  specific  parameter,  if  the 
mathematical  model  of  a  statistical  distribution  is  known  then  it  is  possible  to  state  a 
probability  for  a  value  of  that  quantity  to  fall  within  given  limits  [4].  Once  the 
estimated  time  is  calculated  the  costs  can  be  extrapolated.  For  competing  systems  or 
designs,  the  lowest  cost  system  would  be  preferred  and  easily  determined. 


2.13  RAM  Costs 


The  cost  of  RAM  (reliability,  availability,  and  maintainability)  is  possibly  best 
measured  by  the  cost  of  its  absence  which  may  include  the  absence  of  the  customer. 
One  such  customer,  who  possibly  enhances  the  definition,  promoted  a  high  view  of 
RAM  as  can  be  noted  in  an  old  poem  by  Oliver  Wendall  Holmes,  Sr.  called  The 
Deacon’s  Masterpiece,  or  the  Wonderful  One-Hoss-Shay: 


Now  in  building  chaises,  I  tell  yu  what. 

There  is  always  somewhere  a  weakest  spot,— 

In  hub,  tire,  felloe,  in  spring  or  thill. 

In  panel,  or  crossbar,  or  floor,  or  sill. 

In  screw,  bolt,  thoroughbrace,— lurking  still, 
Find  it  somewhere  you  must  and  will.  — 

Above  or  below,  or  within  or  without,— 

And  that’s  the  reason,  beyound  a  doubt, 

Achaise  breaks  down  but  doesn  ’t  wear  out. 

But  the  Deacon  swore  (as  Deacons  do. 

With  an  '7  dew  vum,  ”  or  an  “I  tell  yeou,  ”) 

He  would  build  one  shay  to  beat  the  taown 
‘n’  the  keounty  ‘n’  all  the  kentry  raoun’. 

It  should  be  so  built  that  it  couldn’  break  daown, 
—  ”Fur,  ”  said  the  Deacon,  "‘t’s  mighty  plain 
Thut  the  weades  ’  place  mus  ’  stan  ’  the  strain; 

‘n’  the  way  t’ fix  it,  uz  I  maintain. 

Is  only  jest 

T  make  that  place  uz  strong  uz  the  rest”  [21]. 
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Such  a  reliable  device,  horse-drawn  chaise  or  not,  is  one  “that  continues  to 
perform  its  intended  function  throughout  its  intended  useful  lifetime,  regardless  of 
adverse  operating  conditions”  [21],  Of  course,  in  view  of  cost  effectiveness  and  the 
consumer  market  of  today,  most  designers  would  feel  the  Deacon’s  masterpiece  was 
grossly  overdesigned  to  last  a  century  without  a  breakdown  —  ten  years  would  be  more 
than  adequate.  Yet,  centuries  ago  the  RAM  concept  was  more  than  just  thought  about. 


2.1.3. 1  History 

The  advent  of  the  machine  age  at  the  beginning  of  the  nineteenth  century  began 
to  see  the  standardization  of  parts  and  with  the  rapid  evolution  of  analytical  prediction 
techniques  like  stress  analysis  and  transform  theories,  the  means  for  reliability  and 
maintainability  (including  diagnosability)  were  gaining  ground.  The  great  breakthrough 
for  reliability,  however,  did  not  arrive  until  the  late  1950’s  when  a  popular  customer 
was  identified-the  U.S.  military  [21].  The  cost  of  the  absence  of  reliability  with 
respect  to  major  missile  weapon  systems  could  be  measured  in  lives.  Though  the  idea 
of  reliability  by  redundancy  was  recognized  during  the  second  world  war  by  the  use  of 
multi-engine  over  single-engine  aircraft  designs,  no  methodology  in  the  design  process 
resulted  [21]. 

Maintainability  can  be  traced  back  to  the  Industrial  Revolution  where  multitudes 
worked  in  mass  assembly  lines  and  designers  developed  guidelines  in  response  to  the 
demands  of  the  mechanics  of  the  products.  It  was  during  this  time  that  the  most 
fundamental  maintainability  principles  originated  [21]. 

The  idea  of  diagnosability  with  respect  to  RAM,  though  always  considered  by 
means  of  troubleshooting  guides  and  fault  finding  techniques,  was  not  molded  into  a 
methodology  for  design  until  the  last  several  years  and  is  still  in  its  fledgling  stage.  As 
a  starting  point,  several  acceptable  techniques  for  designing  for  diagnosability,  and 
hence  quality,  can  be  gleaned  from  concepts  learned  from  RAM  programs. 
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2. 1.3.2  Programs  and  Processes 

Several  major  companies  such  as  M&M  Mars,  Firestone,  General  Motors,  Intel, 
and  Caterpillar  have  applied  RAM  programs  and  processes  to  save  millions  of  dollars. 
One  company  estimates  that  a  2  percent  reduction  in  downtime  saved  $36  million  over 
a  5  year  period  [21]. 

The  programs  and  processes  developed  for  RAM  involve  certain  activities  which 
can  be  incorporated  into  a  company’s  product  development  plan  (PDF)  and  include; 
deciding  on  objectives,  which  may  be  fixed  by  contract;  the  training  of  personnel; 
statements  of  reliability  such  as  failure  rate  and  probability;  stress  and  failure  analysis 
like  the  fault  tree;  maintainability  analysis  including  analysis  of  maintenance 
requirements  which  are  strongly  influenced  by  test  equipment,  manuals,  and  choice  of 
LRUs;  design  review  —  never  to  be  conducted  by  someone  involved  in  the  design; 
design  trade-offs  as  seen  in  figure  2;  cost  recording;  accurate  and  detailed  failure 
reporting  to  be  used  for  maintenance  feedback  and  analysis  of  data;  prototype  testing 
and  RAM  prediction;  controlling  manufacturing  to  ensure  tolerances  are  adhered  to; 
documentation  through  operating  instructions  and  maintenance  manuals;  spares 
provisioning;  bum-in  or  pre-stressing;  and  finally,  the  demonstration  of  RAM  by  the 
use  of  statistical  sample  testing  [4].  The  US  Military  Standard  470  provides  a  formal 
guide  to  producing  a  program  that  includes  all  of  the  above. 

A  RAM  program  can  be  further  broken  down  into  the  two  categories  of  existing 
equipment  and  new  equipment.  Both  have  many  activities  in  common  such  as 
personnel  training  and  analysis  techniques. 

Personnel  training  should  involve  teaching  the  designers  to  work  with  RAM 
program  experts  during  the  design  phase  rather  than  having  the  experts  demand  design 
changes.  Also,  technicians  and  any  maintenance  personnel  that  may  come  in  contact 
with  the  product  should  be  included  in  the  design  process  and  treated  as  customers. 

Existing  equipment  is  equipment  that  has  already  been  procured  and  major 
design  changes  are  usually  out  of  the  question.  By  analyzing  life  cycle  costs  with 
respect  to  RAM,  sometimes  it  may  be  cheaper  to  scrap  the  old  equipment  and  design 
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new.  Following  the  famous  20/80  principle  which  says  that  about  20  percent  of  the 
causes  contribute  to  80  percent  of  the  losses  (or  downtime  in  this  case)  leads  us  to 
analysis  techniques  like  process  analysis  maps  or  fault  trees.  A  fault  tree  is  a  model 
that  graphically  and  logically  represents  various  combinations  of  possible  events  based 
on  a  functional  analysis  to  fmd  the  causes.  A  typical  fault  tree  example  is  shown  in 
figure  4  outlining  the  possible  faults  of  a  pattern  recognition  system. 


Figure  4.  Fault  tree  analysis  for  a  pattern  recognition  system  [24] 


The  technicians  and  maintenance  personnel  should  be  trained  to  accomplish  fault 
trees  or  some  other  form  of  fault  analysis  since  they  interact  with  the  product  in 
possibly  more  ways  than  the  consumer.  Feedback  from  the  fault  trees  can  then  be  used 
to  identify  the  20  percent  causes  and  determine  if  their  minimization  can  be 
accomplished  or  if  redesign  may  be  necessary. 

New  equipment  has  more  latitude  for  change,  yet  the  same  tools  can  be  used  for 
analysis.  If  extensive  design  changes  are  not  desirable  or  feasible  due  to  functionality 
or  production  constraints,  then  minimization  of  fault  effects  can  be  analyzed  with  the 
use  of  tools  such  as  a  failure  modes  and  effects  analysis  (FMEA).  This  “bottom-up” 
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approach  to  analyzing  a  design  can  impart  the  knowledge  of  the  effect  of  each  fault 
found  in  the  fault  tree  analysis  and  this  effect  can  then  be  minimized  by  the  use  of 
redundancy  or  component  interface  selection  [23].  One  author  insists  that  “no 
maintainability  test  for  complex  equipment  should  be  made  without  the  use  of  FMEA” 
[24]  since  the  failure  modes  revealed  will  likely  result  in  downtime.  The  FMEA  for 
the  pattern  recognition  system  of  figure  4  is  shown  in  table  1. 


Failure  mode 

Causes 

Effects 

Criticality 

Design  action 

Fault  verification 

RCM  action 

Optics 

malfunction 

Ambient 

heat 

Permanent 

dclormatJon 

11  A 

Provide  fan 

Warn  of 

Ian  failure 

Chock  fan  tolerances 
every  2  months 

Dir; 

Erroneous 

output 

II  B 

Add  filter 

Not 

required 

Replace  filter  monthly 

Circuit 

parameters 

drill 

Htg.n 

leakage 

current 

Parameters 
out  of  control 

II  D 

Qualify 

critical 

components 

Not 

required 

Install  software  to 
monitor  parameters 

Din  or. 

!  circuit 

inteonment 

performance 

11  B 

Conformal 

coat 

Not 

required 

Not  required 

Hign 

junc-or. 

temc-erat-re 

Degraded 

perlormance 

II  B 

Derate  parts 
below  50% 

Not 

required 

Use  infrared  inspection 

X-Y  table 
innaccurate 

SuDCaer 

oasi-gn 

False 

output 

II  A 

Perform  FMEA 
with  supplier 

To  be 
dotorminc-d 

To  be  determined 

pc  SI- on 

c'" 

False 

output 

j 

i 

II  A 

Software 

control 

Not 

required 

Check  eccentricity 
during  routine 
maintenance 

Table  1 .  FMEA  for  a  pattern  recognition  system  [24] 

The  FMEA  can  include  items  such  as  fault  probability  and  frequency  to  affect 
the  weighting  factor  of  each  fault.  These  items  are  obtained  from  maintenance  data  for 
existing  equipment,  but  may  be  solely  from  analyst  judgment  for  new  equipment  — 
especially  before  prototype  testing. 

The  minimization  of  downtime  of  most  systems  can  many  times  be  affected  by 
the  availability  of  spares,  or  spares  provisioning.  Statistical  techniques  based  on  the 
results  of  the  FMEA  can  be  employed  to  predict  the  optimum  number  of  spares  for  a 
typical  fault.  For  instance,  if  the  failure  rate  of  a  part  is  known  or  predicted,  a 
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particular  assurance  of  having  a  part  on  hand  can  be  obtained.  Since  failure  rate  is 
assumed  constant,  the  probability  of  failure  follows  a  Poisson  distribution  with  a  certain 
mean  value.  From  the  mean  value  the  number  of  spares  required  is  obtained  to  fulfill 
the  designated  assurance  [9]. 

Specific  fault  areas  to  improve  diagnosabiUty  are  pointed  out  with  these  analysis 
techniques.  These  simple  analysis  tools  can  hold  the  power  of  millions  of  dollars  or 
even  lives,  but,  of  course,  management  must  listen  to  the  technicians,  maintenance 
personnel,  and  other  analysts  in  order  to  benefit  from  their  ideas. 

2.2  Diagnosability  &  Analysis 

If  the  statistical  distribution  of  failures  is  known  for  a  given  system  then  the 
probability  of  failure  up  to  any  suggested  replacement  time  can  be  assessed.  If  a  failure 
time  due  to  wearout  is  chosen  then  the  time  at  which  replacement  should  take  place  can 
be  calculated  [4].  The  best  defense  against  interruptions  and  excessive  downtime  is  to 
prevent  equipment  from  failing  while  it  is  “on  duty”.  The  analysis  techniques 
discussed  in  section  2.1.3  are  invaluable,  yet,  some  equipment  always  seems 
determined  to  prove  that  statistics  are  only  averages  [6]  or  even  best  guesses.  This 
equipment  seems  to  test  the  validity  of  the  statistics  in  which  the  analysis  tools  are 
based.  This  raises  questions  about  the  underlying  assumptions  made  for  each  statistical 
tool,  the  methods  of  recording  data  for  analysis,  and  even  specific  fault-finding 
methodologies. 


2.2.1  Analysis  &  Assumptions 

The  promise  of  modem  statistics  is  that  it  provides  not  only  a  precise  summary 
of  the  conclusions  drawn  from  an  evaluation,  but  also  a  reliable  prediction  for  future 
tests  [14].  It  is,  of  course,  impossible  for  statistics  to  prove  that  something  is  tme; 
only  that  the  preponderance  of  data  support  that  conclusion  [29].  As  with  any  model. 
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calculated  assumptions  must  be  made  to  either  simplify  the  problem  and/or  fill  in  the 
unknown  characteristics  of  a  phenomena.  It  can  be  expected,  to  a  minimum  degree 
hopefully,  that  actual  behavior  will  not  follow  the  predicted  statistical  model  accurately 
for  a  given  period  of  the  life  cycle.  The  causes  behind  this  variance  can  be  attributed 
to  poor  assumptions  due  to  either  lack  of  pertinent  information  or  lack  of 
understanding  of  statistical  processes,  or  both. 


2.2. 1.1  iMckofinformation 

Statistical  analysis  is  not  new.  It  has  been  apphed  to  a  wide  variety  of 
engineering  problems  since  the  early  1970’s.  Methods  employed  were  studied  up  to 
200  years  ago  like  the  Guassian  distribution,  named  after  Karl  Guass,  more  readily 
known  as  the  normal  distribution  which  adequately  describes  many  mechanical 
components  [2].  Another  popular  technique  was  proposed  by  Waloddi  Weibull  in  1951 
and  is  known  as  the  Weibull  distribution  —  highly  acclaimed  for  its  simplicity  and 
versatility.  The  log-normal  distribution  is  also  sometimes  used  to  model  system 
behavior  since  in  many  applications,  especially  RAM,  the  data  may  not  fit  the  normal 
distribution.  Figure  5  shows  that  the  three  distributions  have  similar  behavior  near  the 
center,  but  very  different  behavior  near  the  “tails”  [2]. 

Techniques  for  determining  which  curve  is  a  best  fit  for  particular  sample  data 
can  be  little  more  than  guess  work  since  the  probability  of  a  sample  lying  in  the  center 
portion  of  the  curve  (the  mean  plus  or  minus  two  standard  deviations)  is  95.45  percent 
[14].  Since  many  engineering  risk  assessments  quote  a  “six  nine”  (0.999999) 
reliability  based  on  a  confidence  level  that  assumes  the  form  of  the  underlying 
population  distribution  level  is  known,  applying  the  wrong  distribution  will  prove  the 
“six  nine”  reliability  a  gross  exaggeration.  Thus,  the  choice  of  a  wrong  distribution 
could  result  in  an  overestimation  of  structural  reliability  or  the  calculating  of  an 
unrealistically  high  potential  for  disaster  [2]  -  both  compounding  the  problem  of 
diagnosability. 
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Figure  5.  Lognormal,  normal,  and  Weibull  distributions  [2] 


Another  source  of  error  due  to  lack  of  information  is  unanticipated  potential 
failure  modes.  The  historical  account  of  an  Eastern  Airlines  flight  illustrates  this  error 
graphically: 


On  May  5,  1983,  as  an  Eastern  Airlines  L-1011  began  its  decent 
into  Nassau  following  a  47  minute  flight  from  Miami,  the  No.  2  engine 
was  shut  down  because  of  low  oil  pressure.  After  turning  to  head  for 
Eastern’s  maintenance  base  in  Miami,  the  No.  3  engine  failed,  followed 
shortly  by  the  No.  1  engine.  The  L-1011  had  experienced  a  triple 
engine  failure!"  [2],  Fortunately  there  was  a  happy  ending.  The  No.  2 
engine  was  restarted  at  an  altitude  of  3,500  feet  and  the  plane  made  a 
successful  landing  in  Miami  [2]. 

Failure  of  a  single  engine  is  unusual,  failure  of  two  is  even  more  unexpected, 
and  the  probability  of  all  three  failing  should  be  infinitesimally  small  —  or,  was  the 
probability  grossly  underestimated?  The  National  Transportation  and  Safety  Board 
determined  the  triple  engine  failure  occurred  because  all  three  engines  had  magnetic 
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chip  detectors  installed  without  “O”  ring  seals.  Loss  of  oil  caused  the  engines  to 
overheat  and  stop  running.  All  three  were  installed  on  the  same  night,  by  the  same 
two-man  team  on  a  late-night  shift  under  poor  lighting  conditions.  Thus,  the 
probability  of  installing  three  incorrectly,  in  this  case,  was  the  same  as  the  probability 
of  installing  one  incorrectly.  The  omission  of  an  “O”  ring  seal  was  unanticipated  and 
would  likely  not  have  been  included  in  a  prior  risk  assessment  or  diagnosability  target 
[2]. 


2.2.1.2  Lack  of  Understanding 

Some  misconceptions  are  difficult  to  avoid  as  can  be  illustrated  with  the 
previous  example.  For  instance,  incorrectly  applying  the  rules  of  probability  could 
easily  result  in  an  overestimated  reliability.  The  probability  of  the  failure  of  all  three 
engines  on  the  same  flight  would  likely  have  been  incorrectly  computed  by 
“multiplying  probabilities”  of  individual  failures,  assuming  independence  [2].  This 
assumption  had  devastating  results.  Difficulties  like  these  make  probabilistic  life 
analysis  and  diagnosability  alluringly  simple  in  principle,  yet  unfortunately  vulnerable 
to  misuse  and  error. 

Minimization  of  misconceptions  about  statistical  probabilities  can  be  easily 
accomplished  with  some  study  and  application.  Many  misconceptions  are  due  to 
misleading  terminology  such  as  “bathtub  curve”  and  “failure  rate”. 

Reliability  can  also  be  expressed  in  the  mathematical  terms; 

R  =  (9) 

Where  R  is  the  probability  of  the  item  completing  the  specified  mission  successfully,  e 
is  the  natural  logarithmic  base,  t  is  the  duration  of  the  mission,  and  X  is  the  failure  rate 
of  the  item  throughout  the  period  [21].  A  special  case  of  the  Weibull  distribution, 
equation  9  represents  the  exponential  distribution.  Acceptance  of  this  equation 
presupposes  a  subordinate  assumption  that  failure  rate  (X)  is  constant  over  the 
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product’s  entire  operating  life  cycle.  Testing  and  experience  have  proven  that  failure 
rate  versus  life  cycle  more  closely  approximates  a  “bathtub  curve”  which  can  model  the 
reliability  characteristic  of  a  generic  piece-part  type,  but  not  of  an  entire  system  which 
some  analysts  profess.  Even  if  an  exponential  distribution  is  assumed,  as  often  is  the 
case  for  electrical  and  some  mechanical  parts,  the  reliability  bathtub  curves  show  the 
useful  Itfe  can  vary  extensively  from  the  statistical  assumption  (see  figure  6). 


Figure  6.  Bathtub  curves  for  electrical  vs.  mechanical  parts  [7] 


Additional  considerations  often  neglected  for  this  statistical  model  include 
changing  environmental  stresses,  variations  in  tooling,  and  other  manufacturing 
influences.  Thus,  instead  of  a  simple  curve,  the  reliability  might  be  better  depicted 
with  these  factors  in  mind  as  shown  in  figure  7. 

Furthermore,  most  analysts  do  not  realize  that  the  bathtub  curve  is  applied  to 
both  repairable  and  nonrepairable  systems.  This  assumption  implying  that  the  Force  of 
mortality  (FOM)  for  parts  and  the  rate  of  occurrence  of  failures  (ROCOF)  or  failure 
rate  for  a  repairable  system  are  equivalent  is  terribly  wrong  [3].  Therefore,  two 
bathtub  curves  should  be  represented  as  shown  in  figures  8  and  9. 
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Figure  7.  Bathtub  curve  reflecting  environmental  and  manufacturing  stresses  [21] 


Figure  8.  Bathtub  curve  for  parts  [3] 


Figure  9.  Bathtub  curve  for  a  repairable  system  [3] 


Other  false  assumptions  due  to  lack  of  understanding  include,  but  are  not 
limited  to:  assuming  a  linear  plot  of  failures  versus  time  on  linear  paper  implies  a 
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homogeneous  Poisson  Process;  reordering  data  with  respect  to  magnitude  instead  of 
chronological  order;  assuming  overhauls  are  equivalent  to  renewals;  and,  confusing 
“reliability  with  repair”  for  repairable  systems  [3].  AU,  either  directly  or  indirectly, 
affect  system  diagnosability  by  introducing  errors  to  the  system  model. 

Since  the  process  of  probabilistic  analysis  has  been  introduced  considering 
statistical  distributions  of  all  (known)  contributing  factors,  the  key  question  remains  — 
“What  constitutes  acceptable  risk?”.  Considering  the  possible  errors  in  risk 
assessment,  the  pilots  of  the  Eastern  L-1011  would  likely  say  the  “six  nine”  reliability 
was  not  acceptable.  However,  this  is  the  risk  that  they  (unknowingly?)  accept  every 
time  they  climb  into  an  aircraft  [29]. 


2.2.2  Analysis  &  Recording  Data 

Data  used  for  analysis  can  be  obtained  either  from  tests  on  prototype  or 
production  models  or  from  the  field.  In  either  case,  some  means  of  accurate  recording 
of  this  data  must  be  available  or  errors  will  result  in  analysis  conclusions.  Most 
methods  of  recording  data  involve  human  interface  with  extensive  forms  such  as  the 
reliability  centered  maintenance  form  located  in  appendix  A.  Since  the  data  acquisition 
depends  on  persons  rather  than  equipment,  errors  often  occur  due  to  omissions  and 
misinterpretations  which  can  be  traced  back  to  motivation,  training,  and  diagnosability. 

If  the  maintenance  technician  can  see  no  purpose  in  recording  the  information, 
especially  under  poor  working  conditions,  it  is  likely  that  items  will  be  omitted  or 
recorded  wrong.  Once  a  failure  report  has  left  the  initial  recorder  the  possibility  of 
verification  is  very  much  reduced,  especially  due  to  the  high  cost  of  man-hours.  These 
conditions  increase  the  probability  of  recording  a  failure  when  no  failure  exists  (a 
non-failure).  The  testing  and  replacing  of  no-fault  items  or  LRUs  because  of 
convenience  or  previous  experience  is  a  likely  cause  for  this.  Also,  when  multiple 
faults  occur,  a  technician  may  record  a  secondary  failure  as  a  primary  failure.  All  of 
these  errors  in  recording  cause  artificial  inflation  of  failure  rate  data.  Training  and 


23 


motivation  through  knowledge  can  inhibit  these  errors  immensely,  yet  can  never  totally 
remove  incidents  of  incorrect  failure  recording  [4],  Improved  diagnosability  can  limit, 
if  not  eliminate,  replacing  no-fault  items  as  well  as  chronological  recording  errors. 


2,2.3  Analysis  &  Methodologies 

Several  popular  diagnostic  analysis  testing  techniques  have  emerged  based  on 
particular  environments.  Especially  with  the  advent  of  the  digital  computer,  these 
techniques  have  reduced  many  sources  of  error,  but  are  not  totally  without 
disadvantages.  To  minimize  errors,  testing  needs  to  follow  certain  methodologies  as 
well  as  use  the  best  analysis  equipment  for  the  particular  application. 


2. 2.3.1  Testing  procedure 

Several  papers  have  been  written  addressing  the  subject  of  element,  or  LRU, 
checking  order.  With  optimality  based  on  cost,  all  analyses  converge  on  the  following 
general  principle:  check  first  the  LRU  that  minimizes 

xq/p  (10) 

where  t  is  the  testing  cost,  q  is  the  probability  that  the  LRU  is  good,  and  p  is  the 
probability  that  the  LRU  is  bad  [30].  Using  this  procedure  can  optimize  diagnosability, 
yet,  once  again,  statistics  are  only  averages  based  on  historical  data  at  best. 

Simulated  natural  and  induced  environmental  tests  have  been  developed 
scientifically  or  through  trial  and  error  to  provide  laboratory  conditions  comparable  to 
actual  field  test  conditions  if  field  data  is  not  available.  The  procedure  for  diagnostics 
remains  the  same  for  both  conditions,  yet  checklists  have  been  developed  to  specify  and 
calibrate  the  transducers  used  and  minimize  unwanted  “noise”  in  the  test  environment. 

Checklists  have  been  developed  for  several  diagnostic  tests  including 
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temperature,  humidity,  mechanical  shock,  vibration,  sunshine,  dust,  rain,  and  explosive 
environments.  For  example,  the  checklist  of  specification  considerations  for  a 
temperature  test  include:  the  test  temperatures  and  their  tolerances;  exposure  time  and 
its  tolerance  (10%  of  duration  recommended);  protection  against  moisture  condensation 
and  frost;  functionality  desired;  relative  humidity;  the  number  of  sensors  and  their 
locations;  and,  the  initial  temperature  of  the  product  at  the  start  of  the  test  [14], 

The  transducers  used  for  instrumentation  in  the  tests  need  to  be  considered 
according  to  the  specifications  required.  For  instance,  the  decision  to  use  a 
piezoelectric  instead  of  a  strain  gauge  accelerometer  for  a  mechanical  shock  test 
involves  required  specifications  such  as  sensitivity,  linearity,  and  frequency  response. 


2,2.3.2  Testing  equipment 

The  actual  diagnostic  equipment  used  today  has  been  greatly  influenced  by  the 
evolution  of  the  digital  computer  to  keep  up  with  the  advances  of  the  products  they  are 
diagnosing.  The  advent  of  analysis  techniques  such  as  the  FFT  (fast  Fourier  transform) 
have  also  revolutionized  diagnosability  as  well  as  BITE  (built  in  test  equipment) 
technology.  An  example  lies  in  the  arena  of  rotating  machinery,  but  can  be  applied  to 
any  system.  Traditionally,  vibration  monitoring  and  protection  equipment  has  been 
totally  separate  from  the  diagnostic  and  data  acquisition  equipment.  Multiple 
microprocessors  now  virtually  eliminate  this  barrier  and  can  answer  diagnostic 
questions  in  “real  time”.  Questions  include:  is  the  data  believable?  to  what  accuracy?; 
can  I  continue  to  run  the  machine?  for  how  long?  at  what  speed?;  what  happened  to  the 
machine?;  when,  where,  and  how  did  the  malfunction  occur?;  for  how  long  did  it  last?; 
what  was  the  sequence  and  correlation  of  events?;  what  is  the  past  history?;  what  limits 
were  exceeded?;  and,  who  can  help?  To  answer  these  questions  microprocessors 
calculate  peak-to-peak  vibration  and  display  it  on  bar  graphs,  perform  DFT  (discrete 
Fourier  transform),  compare  vibrations  against  stored  alarm  limits,  trip  defeat  functions 
for  calibration  and  maintenance,  calculate  time  to  danger,  measure  transducer  gap 
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voltages,  perform  self  tests,  and  produce  buffered  output  for  test  instruments.  This  is 
all  accomplished  because  of  microprocessor’s  enhanced  speed  due  to:  parallel  channel 
monitoring;  positive  capture  since  connection  is  permanent;  additional  data  availability 
such  as  time  to  danger;  flexibility  due  to  programming  for  different  functions; 
reliability  since  downstream  failures  do  not  impact  upstream  functions;  compatibility 
from  the  digital  form  of  data;  and,  self  testing  capabilities  [15]. 

With  the  discovery  of  the  FFT  (fast  Fourier  transform),  process  time  for  time  to 
frequency  transformations  has  been  exponentially  diminished  so  “real  time”  diagnosis 
of  systems  can  be  accomplished.  Amplification  of  defects  in  rotational  machinery  is 
possible  using  the  FFT  on  a  logarithmic  scale  or  cepstrum  analysis  (a  variant  of  the 
FFT).  These  discoveries  allow  tracking  of  extremely  slow  changes  in  the  transfer 
function  such  as  crack  growth  development  [25].  A  typical  frequency-based 
troubleshooting  checklist  is  located  in  appendix  A. 

Malfunctions,  such  as  bearing  deterioration,  can  be  discovered  using  various 
equipment  with  advantages  and  disadvantages  influencing  error  and  cost  for  each.  For 
instance,  if  the  human  ear  is  the  only  diagnostic  source  for  detecting  a  malfunction,  the 
time  to  failure  wUl  likely  be  rather  short,  but  the  cost  of  equipment  will  be  quite  small. 
If  a  stethoscope  is  added,  the  costs  rise  to  approximately  $300,  but  detection  is  sooner. 
The  errors  involved  in  any  sound  method  include  subjectivity,  inaccuracy  in  trend 
analysis  because  of  no  hard  copy  readings,  and  lack  of  severity  detection.  Temperature 
methods,  such  as  portable  pyrometers  or  permanently  installed  thermocouples,  are 
relatively  inexpensive,  but  the  detection  is  often  too  late  to  replace  the  malfunctioning 
part  during  scheduled  downtime  and  the  analysis  is  often  in  error  since  temperature 
varies  with  load.  Vibration  methods  are  generally  very  expensive  (real  time  analyzers 
start  at  $8500)  yet  have  a  proven  track  record  of  early  detection  if  used  properly  (see 
figure  A2  in  the  appendix).  Lack  of  training  can  result  in  error  with  the  vibration 
method  [4].  Still  other  methods  include  ultrasonic,  shock  pulse,  spike  energy,  acoustic 
emission,  and  fiber  optics  —  each  with  probable  sources  of  error  and  definite 
application  strengths. 
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In  order  to  prevent  systems  from  proving  that  statistics  are  only  averages  and 
failing  when  “on  duty”,  choices  are  available  to  minimize  the  potential  for  error 
through  diagnosability.  Assumptions  in  statistical  methods,  recording  techniques,  and 
methodologies  including  testing  and  equipment  are  all  variables  to  optimize. 


2.3  Diagnosability  &  Design 

Diagnostic  equipment  and  tools  available  today,  in  general,  are  limited  to  after- 
the-design  add-ons  like  BITE  technology  (which  add  weight  and  volume)  or 
maintenance  personnel  tools  (which  many  times  require  system  shutdown  for  analysis). 
Since  the  quality  of  a  product  is  determined,  to  a  great  extent,  during  the  design  phase 
rather  than  during  production  [11]  and  if  both  cost  and  analysis  are  functions  of 
diagnosability,  design  techniques  should  be  explored  to  maximize  the  diagnosability 
inherent  in  the  product  —  keeping  add-on  diagnostic  systems  to  a  minimum. 


2.3.1  Traditional  Design 

The  cost  of  the  unjustifiable  removals  on  the  747  noted  earlier  was  $100  per 
flight  hour,  “a  cost  equivalent  to  adding  8  tons  of  dead  weight  to  the  aircraft,”  directly 
attributed  to  poor  diagnosability  with  respect  to  the  components  that  were  removed 
[11].  Traditional  diagnosability  has  been  an  afterthought  of  product  development. 

Problems  in  both  electronic  and  mechanical  systems  are  addressed  by  adding 
sensor  based  systems  such  as  automatic  test  equipment  (ATE)  and  BITE  [11].  These 
require  communication  devices  called  networks  as  a  means  for  telemetry  to  correlate 
and  analyze  data  for  diagnostics  from  various  different  parts  of  the  system  where 
“smart  sensors”,  like  those  discussed  in  section  2.2.3,  monitor  target  parameters. 
These  add-ons  not  only  add  weight  and  volume  (severely  detrimental  to  businesses  like 
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Boeing),  but  complexity  as  well  —  likely  reducing  reliability  due  to  the  diagnosability 
equipment  itself  failing. 

Another  common  approach,  used  alone  or  in  conjunction  with  add-on 
equipment,  is  removing  and  servicing  equipment  on  a  cyclical  basis  based  on  mean 
time  between  failures  and  other  trend  analysis  statistics  [6]. 

One  reason  fault  diagnosis  is  not  considered  explicitly  until  late  in  the 
production  process  is  that  diagnosability  is  difficult  for  the  designer  to  consider  without 
actual  maintenance  data  [11].  Certainly,  there  must  be  some  way  to  design  for 
reliability  through  diagnosability  without  overdesigning  as  with  the  historically  noted 
One-Hoss-Shay. 


2 .3  >2  Diagnosability  Factors  in  Design 

Several  factors  can  be  used  to  compare  competing  designs  with  respect  to 
diagnosability  and  decide  what  parts  of  a  system  could  be  improved  in  the  design 
phase.  Included  in  theses  factors  are  the  placement  of  parts  based  on  function  (and 
reliability  if  known),  the  placement  and  choice  of  sensors,  and  the  redundancy  of 
sensing  operations  and  LRUs. 

Based  on  equation  (8)  of  section  2.1.2,  diagnosability  time  is  directly 
proportional  to  the  time  until  initial  detection,  the  average  number  of  candidates  for  a 
given  failure,  and  the  distinguishability  between  the  candidates.  The  time  until  initial 
detection  is  a  function  of  the  detection  equipment  of  the  particular  LRU  and  can  be 
modified  using  techniques  discussed  in  section  2.2.3  based  on  the  criticality  of  the  part 
and  its  probability  of  failure. 

From  previous  work  (Clark,  1993)  the  average  number  of  candidates  for  a  given 
failure  can  be  expressed  as 

c  =  (l/«)^c, 


(11) 
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where  c.  is  the  number  of  candidates  for  each  failure  indication,  i,  summed  over  the 
total  number  of  different  failure  indications,  n  [11].  It  has  been  said  that  the  maximum 
number  of  candidates  for  a  particular  failure  is  a  measure  of  the  ambiguity  of  a  system, 

so  LRUs  with  a  high  c  may  confound  diagnosis  —  especially  if  the  same  LRUs  have  a 
high  probability  of  failure.  Decreasing  c  can  be  accomplished  by  placing  particular 
units  in  different  locations  or  changing  sensor  dependencies. 

The  measure  of  distinguishability  can  be  expressed  as 


D  = 


1=1 


«(1  - 1  /  c) 


(12) 


where  n  is  the  total  number  of  possible  indicated  failures,  c  is  the  total  number  of 
candidates  in  the  system,  and  c.  is  the  number  of  candidates  for  each  failure,  i  [11]. 
This  equation  shows  that  a  distinguishability  of  one,  or  100%,  means  that  every 
possible  indicated  failure  would  have  only  one  candidate  and  diagnosis  is  trivial,  where 
as  a  distinguishability  of  zero  means  that  for  any  failure,  aU  LRUs  in  a  system  are 
candidates,  i.e.,  poor  diagnosability  [11].  Improving  D  can  be  accomplished  by,  once 
again,  decreasing  the  total  number  of  candidates  and/or  decreasing  the  complexity  of 
the  total  system. 

A  popularized  factor  for  increasing  the  reliability  of  a  system  is  the  use  of 
parallel  linked  redundancy  of  LRUs  versus  series  linked  components.  By  inspection, 
systems  with  LRUs  linked  in  series  have  a  failure  rate  equal  to  the  sum  of  the  failure 
rates  of  each  LRU.  Parallel  linked  systems  decrease  the  failure  rate.  For  example,  the 
mean  time  between  failures  for  an  equivalent  system  with  two  LRUs  in  parallel  can  be 
expressed  as 

-^  —  +  - - —  (13) 

A  2-|  ^2  -2.j  +2,2 


If  the  failure  rates  of  the  two  components  are  equal,  equation  (13)  reduces  to 

where  0  is  the  mean  time  between  failures  of  each  LRU  [21].  Figures  10  and  11  show 
the  relationships  for  series  and  parallel  systems,  respectively. 
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Figure  10.  Series  system  reliability  [18] 


Figure  11.  Parallel  system  reliability  [18] 


However,  improving  reliability  through  redundancy  is  a  method  subject  to 
restrictions.  In  electrical  and  mechanical  systems  the  performance  parameters  of  a 
combination  of  LRUS  is  not  the  same  as  for  the  original  component  alone  and  the 
degraded  perfonnance  of  the  system  after  one  LRU  fails  is  likely  to  be  less  than  the 
parallel  combination.  It  should  be  emphasized  again  that  redundancy  reliability,  like 
BITE,  carries  the  penalty  of  added  space,  weight,  power  supply,  and  cost  as  well  as  the 
possibility  of  more  maintenance  activities. 

Efforts  to  enhance  reliability  through  complexity  quickly  reach  a  point  of 
diminishing  returns  from  the  diagnosability  point  of  view. 
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The  previous  considerations  for  improvement  have  been  limited  by  the 
functionality  requirements  of  the  system  as  well  as  the  other  factors  in  design.  If  a 
system  must  be  configured  in  such  a  way  that  changing  LRU  positions  is  impossible, 
then  placement  and  type  of  sensor  associated  with  each  I^U  function  can  be  optimized 
in  lieu  of  merely  adding  sensors  (and  weight  and  complexity).  One  study  utilizes  the 
minimization  of  a  positive  definite  scalar  measure  of  the  covariance  matrix  as  an 
optimality  criterion  for  sensor  locations  based  on  minimizing  sensor  uncertainties  [26]. 
The  idea  of  “smart  sensors”  implies  the  sensor,  along  with  a  microprocessor,  makes  the 
diagnostic  decisions  itself  [6].  Of  course,  the  weight  and  volume  capacity  of  the 
system  and  LRU  may  determine  just  how  “smart”  a  sensor  can  be. 

Sensor  placement  can  also  be  a  factor  of  interfering  inputs.  External  or  internal 
“noise”  associated  with  system  operating  conditions  can  contribute  to  false  out-of¬ 
tolerance  readings  or  mask  true  out-of-tolerance  signals.  This  phenomena  increases 
either  unjustifiable  removals  or  allows  for  LRU  failure  without  prior  notice, 
respectively.  Placement  for  minimum  interference  or  use  of  filters  to  eliminate  excess 
noise  can  increase  diagnosability  without  additional  complexity. 


2.3.3  Design  for  Diagnosabilitv 

While  some  systems  incorporate  microprocessors  programmed  to  test  and  isolate 
faulty  LRUS  and  even  switch  to  backup  devices,  most  require  fault  isolation  provisions 
like  accessible  probes  or  connectors  called  test  points.  Test  points  provide  an  interface 
between  test  equipment  and  the  system  for  the  purpose  of  diagnosis,  adjustment,  and 
monitoring  of  performance.  The  provision  of  test  points  is  governed  by  the  level  of 
LRU  chosen  and  will  usually  not  extend  beyond  what  is  required  to  isolate  the 
particular  faulty  LRU  [4], 
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2.33.1  Testability 

To  minimize  the  possibility  of  faults  being  caused  by  maintenance  activities,  test 
points  must  be  in  standardized  positions  within  the  circuit  buffered  by  capacitors  and 
resistors  to  protect  the  system  from  misuse  of  test  equipment.  Enough  space  should  be 
provided  to  allow  for  test  probes  of  the  test  equipment.  As  with  BITE,  reliability  of  the 
test  equipment  should  be  an  order  of  magnitude  better  than  the  system.  Additional 
strategies  to  assess  design  effectiveness  for  testability  can  be  found  in  Mil-Std-2165 
[24].  The  standardization  of  probes  reduces  the  amount  of  test  equipment  as  weU  as 
lessens  the  probability  of  having  the  wrong  test  gear.  It  should  be  noted  that  additional 
unnecessary  test  points  are  likely  to  impair  rather  than  improve  system  diagnosabiUty 
and  therefore  must  be  chosen  carefully  in  the  design  phase. 


2. 3.3.2  Ease  of  maintenance 

Several  design  considerations  can  ease  maintenance  actions  and  improve 
diagnosabiUty.  First,  if  at  all  possible,  minimize  maintenance  in  the  first  place.  For 
example,  development  of  electronic  fuel  injection  in  automobiles  has  eliminated  the 
need  to  check  the  distributor  condition  [24]. 

Many  additional  items,  similar  to  DFA  (design  for  assembly)  goals,  reflect  the 
human  factor. 

Accessibility  refers  to  fasteners  and  covers  as  well  as  position  of  mounting 
relative  to  other  parts.  Parts  should  be  easily  removable  with  features  such  as  quick 
disconnect  plugs  for  hydraulic  and  electrical  parts,  yet  technicians  should  be 
discouraged  from  removing  and  checking  easily  exchanged  items  as  a  substitute  for  the 
correct  diagnostic  procedure.  This  can  be  accomplished  by  the  choice  of  connections 
of  the  particular  LRU,  which  presents  the  classic  trade-off  between  reliabiUty  and 
maintainability.  A  high  reUability  LRU  which  is  unlikely  to  require  replacement  could 
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be  connected  by  a  wrapped  joint,  whereas  a  low  reliability  LRU  could  be  connected  by 
a  less  reliable  plug  and  socket  for  quick  exchange  [4], 

The  amount  of  adjustment  required  during  diagnosis  can  be  minimized  by 
generous  tolerancing  during  the  design.  Guide  holes  for  adjustment  tools  and  visible 
displays  are  also  helpful  for  avoiding  damage  to  the  equipment  and  monitoring 
adjustment  levels,  respectively  [4]. 

Design  for  off-line  repair  can  increase  the  use  of  spares,  but  decreases 
downtime  immensely.  Considerations  here  include  the  handling  capacity  and  size  of 
the  LRU.  Good  handling  requires  lightweight  parts  with  handles  to  avoid  equipment 
damage  as  well  as  protect  from  sharp  edges  and  high  voltage  sources  (even  an 
unplugged  module  can  hold  dangerous  charges  on  capacitors)  [4].  Generally,  as  the 
size  of  the  LRU  increases  the  reliability  decreases  and  the  cost  of  spares  increases. 

Several  ergonomic  factors  influence  diagnosability  based  on  performance  aids 
and  the  environment.  Since  the  short  term  memory  of  a  human  has  the  capacity  of 
only  about  7  bits  of  information,  designs  should  require  minimum  tests  for  diagnosis 
and  minimum  skill  [11].  Overminiaturization  should  be  avoided  if  possible. 
Environmental  conditions  such  as  illumination,  comfort,  and  safety  in  the  form  of  body 
positions  and  stress  generating  factors  like  weather,  heat,  vibration,  and  noise  should 
all  be  an  integral  part  of  design  considerations  [4].  Figure  12  illustrates  how  stressors 
such  as  temperature  can  affect  diagnosability. 
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Figure  12.  Effect  of  temperature  on  number  of  mistakes  [24] 
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A  complete  checklist  of  diagnosability  design  with  respect  to  human  factors  can 
be  found  in  reference  31. 


2.3.4  Selection  of  Designs 

Design  selection,  from  the  earliest  stages  of  concept  development,  should 
consider  every  slice  of  diagnosability  improvement  introduced  in  the  previous  sections. 
From  the  LRU  to  the  entire  system  configuration,  selection  of  particular  designs  can  be 
optimized  using  techniques  involving  life  cycle  costing  based  on  historical  and 
predicted  data,  mathematical  prediction  models  based  on  advances  in  diagnosability 
technology,  and  screening  methods  using  prototype  or  production  parts. 

As  noted  previously,  life  cycle  costing  provides  essential  comparisons  between 
existing  system  architectures  based  on  historical  field  data  and  design  phase  concepts 
based  on  prediction  techniques.  Using  cost  of  diagnosability  as  the  common  metric, 
the  optimal  system  design  can  be  chosen  from  a  set  of  limited  choices. 

Mathematical  prediction  models  are  used  extensively  to  weigh  the  savings  of 
discrete  advances  in  diagnosability  technology.  For  instance,  one  study  developed  a 
mathematical  model  for  predicting  impact  on  maintenance  man-hours  of  on-board  test 
equipment  in  the  form  of  BITE  for  use  in  the  conceptual  design  of  aircraft  including  the 
USAF  Advanced  Tactical  Fighter  (ATF)  [17].  The  cost  and  performance  penalty  of 
incorporating  BITE  must  be  balanced  or  exceeded  by  cost  savings  in  support, 
manpower,  and  improvements  in  availability  to  justify  incorporating  this  technology  in 
the  design.  The  life  cycle  costing  mechanism  available  through  the  Boeing  Company  is 
called  the  DEPCOST  (dependability  cost)  model.  This  model,  available  for  use  on  the 
spreadsheet  program  Excel  4.0  or  higher,  incorporates  all  parameters  that  affect  the 
cost  of  an  aircraft  throughout  its  20  year  life  cycle. 

If  actual  products  are  available  for  testing,  screening  based  on  reliability  and 
diagnosability  parameters  can  be  accomplished  using  several  techniques  including; 
screening  by  truncation  of  distribution  tails  based  on  tolerance  limits  defined  by  a 
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normal  distribution;  “interference”  between  stress  and  strength  distributions,  again 
using  normal  distributions  of  environmental  stress  and  product  strength  to  eliminate 
products  where  intersections  occur;  bum-in  screening  to  identify  and  eliminate  products 
with  early  failure  mechanisms;  and,  linear  screening  which  predicts  early  failures  based 
on  a  weighted  average  of  early  life  parameters. 

Selection  of  designs  based  on  diagnosability  promises  to  move  today’s  products 
from  weighty/costly  add-ons  to  maintenance-friendly/efficient  machines  with 
diminishing  costs. 
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3.0  DESCRIPTION  AND  MODELING  OF  THE  BOEING  737-300  BLEED  AIR 

CONTROL  SYSTEM 


This  section  introduces  the  bleed  air  control  system  (BAGS)  including  major 
LRUs  and  their  indications.  The  scope  of  the  analysis  and  all  assumptions  are 
explicitly  stated  for  the  system.  Modeling  of  the  system  is  accomplished  with  the  use 
of  a  failure  modes  and  effects  analysis  (FMEA)  by  Airesearch  and  maintenance 
manuals  provided  by  the  Boeing  Company.  Failure  combinations  are  incorporated  in 
similar  fashion  to  previous  research  (Clark,  1993)  for  ease  of  comparison  analysis  and 
application  of  system  metrics.  Though  the  737-300  is  singled  out  in  this  research,  all 
analyses  and  recommendations  can  be  extended  to  the  400  and  500  models  since  they 
are  exactly  the  same. 


3.1  Description  of  the  Bleed  Air  Control  System  (BAGS) 


The  BAGS  consists  of  two  identical  sets  (one  per  engine)  of  valves,  controls, 
ducts,  and  a  heat  exchanger  mounted  in  the  engine  nacelle  area  as  shown  in  figures  13 
and  14. 


PRECOOLER  CONTROL 
VALVE  SENSOR 


Figure  13.  737-300  BAGS  component  location  -  left  view 
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Figure  14.  737-300  BAGS  component  location  -  right  view 

Each  set  of  equipment  automatically  selects  the  engine  bleed  air  supply  from 
either  the  low-stage  (5th  stage)  or  high-stage  (9th  stage)  bleed  ports  and  regulates  the 
pressure  and  temperature  supplied  to  the  air-using  systems  such  as  cabin  air 
conditioning,  cargo  heating,  and  anti-ice. 

Bleed  air  from  the  5th  and  9th  stage  compressors  is  routed  through  a  heat 
exchanger,  called  the  precooler,  where  it  is  cooled  with  air  from  the  engine’s  fan. 
From  the  precooler,  the  air  continues  to  the  pneumatic  manifold  as  shown  in  figure  15. 


Figure  15.  737-300  BAGS  schematic 
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Since  bleed  air  must  be  delivered  to  the  pneumatic  manifold  within  specific 
temperature  and  pressure  ranges  to  prevent  under/overheat  and  under/overpressure 
conditions,  a  number  of  valve  and  control  systems  are  used  for  regulation. 

During  takeoff,  climb,  and  most  cruise  and  hold  conditions,  the  pressure 
available  from  the  5th  stage  is  adequate  to  meet  the  requirements  of  air  supply  used. 
During  descent,  approach,  landing  and  taxi  conditions  9th  stage  bleed  air  is  required. 
The  selection  of  the  bleed  supply  is  controlled  by  the  high-stage  valve  (HPSOV)  and 
the  high-stage  regulator  (HSreg)  setting.  The  HPSOV  is  responsible  for  regulating  and 
shutting  off  the  flow  of  9th  stage  engine  bleed  air  in  conjunction  with  signals  from  the 
remotely  located  HSreg  which  selects  the  proper  bleed  air  stage  as  necessary  to  satisfy 
system  requirements.  The  low  pressure  check  valve  (Check)  permits  the  flow  of  5th 
stage  bleed  air  and  prevents  higher  pressure  air  from  the  9th  stage  from  back  flowing 
into  the  5th  stage.  The  pressure  regulator  and  shutoff  valve  (PRSOV)  limits  bleed  air 
to  a  predetermined  pressure  level  for  the  system.  Secondarily,  the  PRSOV  works  in 
conjunction  with  the  450°F  thermostat  (Thermo)  as  a  flow  modulating  valve  to  limit 
downstream  temperature  within  a  maximum  upper  temperature  band  based  on  signals 
from  the  Thermo.  A  remotely  located  bleed  air  regulator  (Breg)  works  with  the 
PRSOV  to  control  the  output  pressure  to  a  maximum  and  incoiporates  an  overpressure 
switch  which  activates  the  PRSOV  to  close  in  the  event  of  extreme  bleed  pressure.  The 
precooler  control  valve  (FAMV)  controls  the  flow  of  fan  cooling  air  to  the  bleed  air 
precooler  (PCLR).  The  FAMV  modulates  in  response  to  pneumatic  control  pressure 
signals  from  a  remotely  located  precooler  control  valve  sensor  (PCLRsen)  to  maintain 
bleed  air  temperature  downstream  of  the  precooler  within  a  specified  range.  The 
PCLR  vents  excess  air  to  ambient  as  do  the  HPSOV  and  PRSOV  by  incorporating 
pressure  relief  valves  to  provide  additional  actuator  relief  in  the  event  of  transient 
overshoots.  All  components  are  connected  by  a  series  of  ducts  (duct). 

The  BACS  currently  has  five  sensors,  or  indications,  that  are  used  to  diagnose 
system  failures.  These  indications  include  1)  above  normal  readings  on  an  analog 
pressure  gauge  2)  below  normal  readings  on  an  analog  pressure  gauge  3)  bleed  trip  off 
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light  illumination  4)  low  cabin  pressure  on  an  analog  pressure  gauge,  and  5)  low  cabin 
temperature  on  an  analog  temperature  gauge.  All  subsequent  analysis  refer  to  these 
indications  in  the  predeeding  numerical  order,  e.g.,  bleed  pressure  hi  &  bleed  trip  off 
equals  indication  13. 


3.2  Scope  and  Assumptions  of  BAGS  Analysis 


3.2.1  Scope 

The  valves,  controls,  ducts,  and  systems  making  up  the  BAGS  and  described  in 
the  previous  section  (parenthetically  denoted)  are  considered  LRUs  which  can  be 
replaced  on  the  repair  line  as  the  lowest  physical  level  of  replacement.  Each  LRU 
provides  a  function  for  the  system  that  can  be  measured.  The  five  indications  listed 
provide  the  performance  measures  of  each  LRU  individually  and  collectively  depending 
on  the  mode  of  operation  of  the  system.  An  example  is  the  HPSOV  providing  pressure 
to  the  system  measured  by  the  analog  pressure  gauge  on  the  pilot’s  overhead  panel. 
The  LRU,  HPSOV  in  this  case,  is  directly  associated  with  an  indication,  pressure  in 
this  case.  The  LRU  to  indication  relationship  is  causal  in  direction. 

Each  indication,  though,  does  not  necessarily  imply  a  causal  relationship  to  an 
LRU  unless  only  one  LRU  could  have  possibly  caused  the  indication— a 
distinguishability  of  one  (section  2.3.2).  The  process  of  diagnosis  is  one  of 
determining  the  set  of  parameters, or  LRUs,  of  a  system  that  have  parameter  measures, 
or  indications,  that  fall  outside  the  desired  (or  necessary)  design  values.  This  indication 
to  LRU  relationship  is  diagnostic  in  direction,  and  the  resulting  set  of  suspect  LRUs  are 
called  candidates  [11]. 

The  scope  of  BAGS  model  is  to  define  the  LRU/indication  relationships  in  such 
a  way  as  to  incorporate  aU  LRUs  and  indications  in  the  system  as  well  as  all  modes  of 
failure  of  each  LRU.  Successful  completion  of  the  model  allows  for  systematic 
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changes  to  be  incorporated  and  analyzed.  Assumptions  are  made  to  simplify  the 
analysis  and  to  provide  consistency  with  a  real  system. 


3.2.2  Assumptions 

As  opposed  to  previous  research,  this  analysis  incorporates  all  operating 
conditions  of  the  aircraft  at  once  since  the  information  from  all  engine  output 
conditions  is  realistically  available  to  maintenance  personnel.  To  move  beyond  the 
trivial,  proper  electrical  power  is  assumed  to  be  available  to  the  system,  a  failure  that 
has  no  indication  associated  with  it  is  not  considered,  and  an  indicator  failure  is  not 
considered  since  the  flight  crew  can  establish  its  validity.  Failure  of  circuit  protection 
is  not  considered.  Valve  port  leakage  and  external  leakage  are  not  considered. 

Only  one  LRU  failure  at  a  time  is  considered,  i.e.,  mutually  exclusive,  though 
an  analysis  technique  for  dependent  LRU  failures  (passive)  is  developed.  All  ducting  is 
considered  to  be  one  LRU.  The  failure  rates  experienced  based  on  the  FMEA  and 
Boeing’s  Dependability  Cost  (DEPCOST)  model  are  in  the  same  proportion  as  those 
predicted.  Failure  modes  obtained  from  the  FMEA  for  the  BAGS  are  the  only  failure 
modes  considered.  Maintenance  is  performed  in  accordance  with  established 
maintenance  procedures  and  by  personnel  possessing  appropriate  skills  and  training. 

Inputs  to  the  BAGS  model  are  obtained  through  design  standards  and 
engineering  judgment  if  not  stated  explicitly  by  the  Airesearch  FMEA  or  Boeing 
publications. 


3.3  Modeling  of  the  Bleed  Air  Control  System  (BAGS) 

Failure  mode  information  is  available  from  the  FMEA  conducted  on  the  737- 
300  BAGS  including  probability  assessments  for  each  mode  of  failure.  Mean  time 
between  failures  for  each  LRU  is  available  from  a  completed  DEPCOST  model  based 
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on  historical  data  and  maintenance  reviews  for  the  system  as  well.  Since  an  LRU  can 
fail  in  several  ways,  a  “sometimes”  indication  developed  to  exhibit  relations  between 
failures  and  indications  that  only  sometimes  promote  failure  indications.  The  fault  tree 
analysis  model  of  the  BAGS  shown  in  figure  16  incorporates  both  always  and 
sometimes  relations  depicted  as  solid  and  dashed  lines,  respectively.  Due  to  space 
constraints  the  LRU  failures  (rectangles)  are  placed  both  above  and  below  the 
indications  (ovals). 


With  this  defined  system  model,  metrics  can  be  developed  to  compare  different 
systems  that  perform  the  same  function  by  totally  different  designs  or  by  reassigning 
LRU-indication  relationships.  Refining  previous  research  metrics  (Clark,  1993)  to 
address  dependent/passive  failures  and  defining  a  prediction  method  to  determine  mean 
time  between  unscheduled  removals  (MTBUR)  leads  to  a  redesign  methodology  based 
on  diagnosability.  Incorporating  these  prediction  metrics  into  the  life  cycle  costing 
mechanism  DEPCOST  model,  total  diagnosability  cost  savings  can  be  discovered. 
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4.0  DIAGNOSABILITY  METRICS  AND  REDESIGN  METHODOLOGY 


For  diagnosability  to  be  considered  in  the  design/redesign  process,  there  must 
be  some  way  to  predict  how  system  changes  will  affect  system  parameters  for 
comparing  competing  designs  with  respect  to  diagnosability.  A  methodology  based  on 
the  prediction  technique  must  be  arrived  at  for  use  in  determining  what  parts  of  the 
system  should  be  changed  to  improve  diagnosability.  In  section  4.1,  metrics  from 
previous  work  are  extended  to  measure  the  relative  diagnosability  of  systems  with  LRU 
failures  that  are  dependent/passive.  A  prediction  metric  based  on  unjustified  removals 
and  time  is  introduced  in  section  4.2.  A  design  change  methodology  is  discussed  in 
section  4.3. 


4.1  Dependent  Failures 

As  noted  from  previous  work  (Clark,  1993),  determining  which  LRUs  are 
difficult  to  diagnose  is  not  complex.  By  examining  the  fault  tree  analysis  model  of 
figure  16,  a  list  of  all  possible  failures  and  their  corresponding  candidates  can  be 
assembled.  It  may  seem  that  the  greater  number  of  times  a  certain  LRU  appears  as  a 
candidate,  the  harder  it  is  to  diagnose.  Yet,  if  that  particular  candidate  is  the  only 
candidate  for  many  of  its  failure  modes  it  does  not  present  a  diagnostic  challenge  at  all. 
Moreover,  even  if  a  certain  LRU  is  hard  to  diagnose,  it  may  be  of  little  concern  if  its 
failure  is  very  unlikely  to  occur  [11]. 

Taking  the  above  factors  into  consideration,  equation  12  of  section  2.3.2  was 
modified  to  reflect  the  probability,  or  failure  rate,  of  each  particular  LRU  as  shown  in 
equation  14  as  weighted  distinguishability  [11]. 
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WD  =  -!=! - ; -  (14) 

{l-\IC)tPF, 

i  =  \ 

PF^  is  the  probability  of  LRU  failure  as  defined  by  equation  15. 

n(‘-^'c,)  (15) 

candidates 

PCj  is  the  probability  of  failure  of  each  of  the  candidate  LRUs  for  a  given  indication. 

Weighted  distinguishability,  like  distinguishability,  varies  from  zero  to  one,  but 
provides  a  more  realistic  approach  to  system  diagnosis  comparisons. 

Metrics  defined  up  to  this  point  have  been  derived  from  a  mutually  exclusive 
standpoint  with  respect  to  failures,  i.e.,  only  one  LRU  failure  occurs  at  a  time  to 
produce  a  given  failure  indication.  Realistically,  this  is  not  always  the  case.  In  fact, 
the  737-300  FMEA  incorporates  a  section  of  passive  LRU  failures,  that,  in  conjunction 
with  certain  other  passive  failures,  activate  a  failure  indication  —  therefore  the  LRU 
failures  are  dependent. 

Since  merely  the  incidence  of  one  passive  failure  will  not  generate  a  failure 
indication,  the  definition  of  PF^  for  use  in  equation  14  should  be  expanded  to 
incorporate  dependent  failures  such  as  that  depicted  in  the  fault  tree  analysis  model  of 
figure  17  if  one  or  more  passive  LRU  failures  are  to  be  modeled. 


Figure  17.  Sample  fault  tree  analysis  including  independent  and  dependent  sources  [10] 
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Equation  15  essentially  defines  the  additive  rule  of  probability.  Incorporating  a 
dependent  passive  event  such  as  fault  C  in  figure  17  requires  the  use  of  the 
multiplicative  rule  of  probability.  For  such  modeling,  equation  16  is  suggested  for  use 
in  equation  14. 


PF,  = 


n  {'--PC,)  ripc., 

candidates  candidates 


ripc, 

\candidates 


k2 


(16) 


Once  again,  all  PC  terms  are  the  probability  of  failures  of  each  of  the  candidate  LRUs 
for  a  giveii  indication,  yet  based  on  dependency.  PC j  is  independent,  PQ,  has  an 

“embedded”  dependency,  and  PQj  has  an  “extended”  dependency.  Figure  17  models 
an  extended  dependency  of  fault  C.  Though,  if  the  “and”  and  “or”  gates  were 
switched,  the  dependency  would  be  embedded  between  faults  A  and  B.  Of  course,  the 
PQ  terms  are  only  utilized  if  the  model  embodies  them,  otherwise  they  are  discarded 
and  equation  15  suffices. 

Though  the  analysis  of  the  passive  failures  in  the  737  BAGS  system  is  not 
included  in  the  scope  of  this  research,  weighted  distinguishability  can  now  be  applied 
to  virtually  any  system  modeled  by  fault  tree  analysis. 


4.2  Mean  Time  Between  Unscheduled  Removals  (MTBUR) 

Attributed  by  Boeing  as  the  “single  most  important  input”  in  the  DEPCOST 
model,  MTBUR  has  been  targeted  by  this  research  as  the  overriding  prediction 
parameter  of  diagnosabUity.  For  an  aircraft  system,  MTBUR  is  defined  as  the  average 
number  of  unit  flight  hours  occurring  between  unscheduled  removals  of  an  LRU. 
Mathematically,  it  is  the  inverse  of  the  LRU  removal  rate.  Reliability  mathematics  and 
labor  time  are  the  key  contributors  to  the  derivation  of  the  predicted  MTBUR  based  on 
LRU  failure  rates  and  system  structure. 
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Though  the  normal  distribution  is  capable  of  describing  most  mechanical  part 
lives,  the  scheduled  maintenance  overhaul  and  replacement  times  are  assumed  to  be 
within  the  middle  portion  of  the  curves  shown  in  figure  6  of  section  2.2.1.  Therefore, 
the  exponential  distribution  of  equation  9  is  used  in  all  subsequent  analysis— assuming  a 
constant,  or  near  constant,  failure  rate.  The  structure  of  a  system  is  most  readily 
evaluated  in  terms  of  times  to  complete  maintenance  actions.  The  assumption  of 
constant  working  conditions  in  the  context  of  human  factors  as  well  as  proper 
experience  and  training  are  made.  Equation  2  is  used  to  define  mean  time  between 
failures  (MTBF)  to  avoid  redundancy  in  the  calculations  by  accounting  for  existent 
false  alarms.  The  analysis  also  assumes  a  certain  degree  of  maintenance  technician 
knowledge  prior  to  diagnosis  based  on  the  principle  of  optimum  checking  order 
(equation  10).  In  this  case  the  cost  factor  is  in  the  form  of  line  labor  hours. 

From  a  generic  FMEA  a  fault  tree  analysis  model  can  be  assembled  to  include 
the  failure  rate  of  not  only  the  LRU,  but  also  the  mode  in  which  it  fails.  Therefore,  a 
particular  failure  indication  rate  can  be  assessed  by  summing  the  failure  rates  of  all 
LRUs  with  a  common  indication; 

n 

'^failmteLRU^^indj  =  failrateindj  (17) 

1=1 

given  indj  is  the  common  indication.  Since  maintenance  technicians  work  in  the 

diagnostic  direction,  this  indication  failure  rate  is  a  necessary  starting  point. 

In  the  science  of  diagnostics  an  LRU  will  be  removed  in  one  of  two  conditions; 
failed  or  not  failed.  Removal  in  the  failed  condition  can  be  predicted  directly  from  the 
reliability  of  the  LRU  and  is  justified.  Removal  in  the  not  failed  condition,  or 
unjustified  removal,  is  a  function  of  the  probability  of  detecting  the  wrong  LRU  and 
the  time  it  will  take  to  repair  it  as  well  as  how  often  the  other  LRU  candidates  for  that 
indication  fail.  Equation  18  defines  the  prediction  metric  for  total  MTBUR  of  an  LRU. 


MTBUR,^,  -  1/(1  /  MTBUR^„  + 1  /  MTBURj  ) 


(18) 
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MTBURj  is  the  mean  time  between  justified  unscheduled  removals  of  an  LRU  and  is 
equal  to  the  MTBF  of  that  particular  LRU.  MTBUR^„  is  the  mean  time  between 
unjustified  unscheduled  removals  defined  by  the  mean  time  between  failures  of  all  other 
candidate  LRUs  ( jdivided  by  the  probability  of  detecting  the  particular  LRU 
in  question  (PD.): 

MTBUR.,  =  (19) 


where  PD.  is  defined  by 


PC.  ind, 


\LLHPR  +  SLHPR) 


(20) 


where  PC,  ^ndj  is  the  probability  of  a  particular  LRU  failing  in  a  mode  that  incites  a 

given  failure  indication  (generated  from  failrateLRUf'^ndication. ),  LLHPR  is  the  line 

labor  hours  per  removal  of  the  particular  LRU,  and  SLHPR  is  the  shop  labor  hours  per 
removal  of  the  particular  LRU.  Both  time  variables  are  retrieved  from  maintenance  log 
books  and  historical  data. 

For  a  complete  prediction  of  the  total  MTBUR  of  a  particular  LRU  in  a  system, 
equation  19  is  inverted  for  each  indication  to  find  the  unjustified  removal  rate  and  then 
added  to  the  others  to  find  the  total  unjustified  removal  rate  of  the  particular  LRU.  The 
total  unjustified  removal  rate  is  then  inverted  to  find  the  total  MTBUR^„  which  is 
applied  to  equation  18.  Examples  of  the  MTBUR  predictions  are  found  in  section  5.0 
as  well  as  a  detailed  spreadsheet  analysis  located  in  appendix  B. 


4.3  Design  Change  Methodology 

The  MTBUR  prediction  metric  serves  as  a  standard  for  change  when  comparing 
competing  designs.  Analogous  to  the  Service  Modes  Analysis  (SMA)  developed  as  a 
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methodology  for  design  changes  based  on  serviceability  [12],  design/redesign  based  on 
the  MTBUR  prediction  metric  should  focus  the  following  system  changes: 

1.  LRUs  with  a  high  X  and  low  MTBUR. 

2.  URUs  with  high  spare  costs. 

3.  LRUs  included  with  highly  ambiguous  indications  (high  c). 

4.  LRUs  with  room  for  improvement  (MTBF  -  AfTBUR  >  lOOOOhrs). 

5.  Candidate  combinations  that  will  increase  the  “overall”  system  MTBUR, 
(especially  the  MTBUR  of  high  cost  LRUs) 

6.  Indications  with  a  high  failure  rate  {failrateindj). 

Feasibility  of  system  changes  in  terms  of  complexity  of  LRUs  and  their  functions 
should  also  be  noted  for  cost  optimality. 

The  MTBUR  prediction  metric  can  be  applied  to  any  system  with  a  fully 
defined  fault  tree  analysis  model  and  design  change  can  be  implemented  based  on  the 
preceding  discussion.  Diagnosability  comparisons  and  ultimately  cost  comparisons 
prove  significant  gains  in  insight  for  analysis  based  on  this  technique. 
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5.0  APPLICATION  AND  EVALUATION  OF  MTBUR  PREDICTION  METRIC 


The  procedures  introduced  in  the  previous  sections  allow  the  designer  to 
accurately  model  an  existing  system  to  shed  light  on  which  LRUs  are  a  source  of 
diagnosabUity  problems.  The  designer  can  also  incorporate  system  changes  and  see 
precisely  how  time  and  cost  are  affected.  For  the  BAGS,  the  PRSOV  is  a  known 
diagnostic  challenge  due  to  its  historical  high  rate  of  unjustifiable  removals.  Previous 
work  (Clark,  1993)  suggests  a  comparison  of  metrics  such  as  c  from  equation  11  to 
identify  components,  like  the  PRSOV,  with  potential  diagnosability  problems  and  then 
an  application  of  equation  14  to  find  a  weighted  distinguishability  for  modified  systems 
to  see  if  an  improvement  is  achieved.  Application  of  the  MTBUR  prediction  metric 
allows  for  an  immediate  evaluation  of  not  only  which  LRUs  pose  a  threat  to 
diagnosability,  but  which  improvements  in  diagnosability  are  feasible. 

The  current  737  BAGS  design  is  the  testing  ground  for  the  MTBUR  prediction 
metric  in  section  5.1.  Section  5.2  applies  the  design  change  suggestions  of  section  4.3 
to  develop  several  redesigns  of  the  system.  An  evaluation  based  on  MTBUR  changes 
and  cost  savings  is  presented  along  with  recommendations  in  section  5.3.  Section  5.4 
addresses  the  issue  of  spares  provisioning. 


5.1  Application  of  MTBUR  prediction  to  the  original  737  BAGS 

As  stated  earlier,  only  active/independent  failures  will  be  analyzed  which  make 
up  the  vast  majority  of  unjustifiable  removals  (over  90%).  From  the  fault  tree  analysis 
model  of  figure  16,  section  4.2  metrics  can  be  applied  for  each  LRU  to  arrive  at  a 
predicted  MTBUR.  An  example  spreadsheet  of  the  original  system  analysis  for  the 
PRSOV  is  located  in  appendix  B.  Using  the  DEPCOST  model  for  historical  values  of 
each  LRUs  MTBUR,  an  evaluation  of  the  prediction  metric  may  be  accomplished. 
Table  2  includes  values  of  historical  versus  predicted  MTBUR. 
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LRU 

HISTORICAL 

PREDICTED 

HPSOV 

38931 

PRSOV 

5394 

6789 

PCLR 

65758 

76841 

duct 

llOOO 

11827 

FAMV 

16421 

27123 

CHECK 

309102 

319140 

HSreg 

10985 

15659 

15168 

24106 

Breg 

11607 

16700 

Thermo 

13799 

89645 

Table  2.  Historical  versus  predicted  MTBUR 


Several  LRUs  (HPSOV,  Breg,  and  duct)  had  no  MTBUR  listed.  Based  on 
engineering  judgment,  these  LRUs  were  assigned  an  MTBUR  equal  to  twice  their 
historical  mean  time  between  failures  (MTBF).  Other  omitted  items  include  the 
SLHPR  and  spares  cost  of  the  Breg  and  HPSOV  which  are  estimated  at  values  of 
similar  equipment  (HSreg  and  PRSOV  values,  respectively,  varying  slightly  due  to 
complexity  differences).  The  predicted  values  fall  within  approximately  twenty  percent 
of  the  true  values  with  the  exception  of  the  450°F  thermostat.  This  anomaly  could  be 
explained  by  organizational  factors  outside  the  scope  of  this  research,  e.g.,  direction 
from  higher  levels  because  of  low  spares  cost,  ease  of  maintenance,  least  SLHPR,  or 
merely  politics,  since  the  LRU  should  last  much  longer  based  on  its  failure  rate. 

The  ultimate  evaluation  involves  comparing  the  cost  of  the  true  versus  historical 
system  using  the  DEPCOST  model  directly.  A  comparison  of  cost  and  MTBUR  can  be 
accomplished  by  viewing  figures  18  and  19.  These  figures  are  constructed  by 
modifying  the  MTBUR  input  column  of  the  DEPCOST  model  to  reflect  first  historical 
values  and  then  predicted  values  of  MTBUR.  The  450°F  thermostat  is  extracted  from 
subsequent  analysis  due  to  the  assumed  organizational  factors  mentioned  earlier  as  well 
as  the  LRU  impotency  with  respect  to  overall  cost  savings  compared  to  all  other  LRUs 
in  the  system.  It  should  be  noted  that  in  all  DEPCOST  analyses  only  one  spare  per 
LRU  is  considered  to  gain  savings  per  unit  LRU. 
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Figure  19.  DEPCOST  model  of  historical  MTBURs 
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Figures  18  and  19  validify  the  MTBUR  prediction  metric.  Not  only  are 
predicted  MTBURs  and  costs  within  an  acceptable  range  of  historical  values,  but  order 
is  preserved  with  respect  to  both  candidates  for  diagnosability  problems  and  cost 
drivers.  With  this  information,  the  choice  of  LRUs  and  functions  for  redesign  can  be 
easily  made. 

Since  no  passive  failures  are  addressed  in  this  research  one  would  anticipate  a 
higher  predicted  MTBUR  and  therefore  a  lower  cost  than  the  historical  values  as 
figures  18  and  19  illustrate.  A  sample  DEPCOST  model  spreadsheet  can  be  found  in 
appendbc  C  (for  analysis,  all  information  not  pertaining  to  this  research  is  extracted). 


5.2  System  Modification  and  Comparison 


All  redesigns  are  based  on  not  only  diagnosability  improvements,  but  also  on 
cost  savings  since  as  noted  in  section  2.0,  cost  is  always  the  common  denominator. 
Seven  design  modifications  are  studied  and  evaluations  for  each  based  on  feasibility  and 
logic  are  given  in  accordance  with  the  design/ redesign  methodology  discussed  in 
section  4.3.  The  benchmark  for  all  design  comparisons  is  the  original  design  using 
predicted  values  of  MTBUR  for  continuity.  A  sample  spreadsheet  analysis  and 
DEPCOST  illustration  for  each  change  is  located  in  appendix  C. 


5.2.1  Change  1— Remove  Pressure  Function  from  PRSOV 

Since  the  PRSOV  was  a  point  of  interest  in  previous  research  involving  the  747- 
400  BAGS,  and  apparently  is  in  the  present  analysis  as  well,  the  most  successful  system 
change  suggested  in  that  analysis  (Clark,  1993)  is  incorporated  in  the  first  modification. 
This  change  follows  all  suggestions  found  in  section  4.3  and  involves  essentially 
removing  the  pressure  regulating  function  of  the  PRSOV. 


52 


Like  the  temperature  control  function,  the  pressure  control  function  of  the 
PRSOV  is  shared  by  other  LRUs.  In  this  case,  the  pressure  is  regulated  directly  at  the 
high  and  low  pressure  ports  instead  of  at  the  junction  of  the  two  just  prior  to  the 
precooler.  This  change  requires  the  check  valve  to  be  replaced  by  a  control  valve. 
Also,  the  Breg  must  then  be  moved  to  the  new  control  valve  to  monitor  downstream 
pressure  and  signal  a  bleed  trip  off  indication  in  the  event  of  an  overpressurization. 

Based  on  benchmark  MTBUR  and  cost,  change  1  increases  the  MTBUR  for  the 
PRSOV  by  51  percent,  decreases  the  MTBUR  for  the  check  valve  by  79  percent,  and 
slightly  decreases  the  MTBUR  for  the  Breg.  Since  the  check  valve  is  converted  to  a 
control  valve,  the  failure  rate  of  its  counterpart  control  valve,  the  HPSOV,  is  assigned 
to  the  check  valve  bringing  its  MTBUR  down  exponentially.  Since  the  check  valve  is 
more  resistant  to  cost  change  than  the  PRSOV  due  to  labor  time  and  ambiguity,  overall 
cost  is  in  favor  of  the  PRSOV.  The  cost  savings  for  this  system  change  is  on  the  order 
of  8  percent-a  significant  amount  based  on  the  size  and  complexity  of  an  aircraft 
system. 

The  feasibility  of  this  design  change  can  be  approached  from  two  directions. 
The  number  of  LRUs  remains  constant,  and  hence  the  complexity  does  not  increase  nor 
do  the  functional  requirements  change  drastically.  Even  the  relationship  of  the  Breg  is 
not  significantly  altered  since  it  was  remotely  located  from  the  PRSOV  anyway.  Yet, 
considering  the  limited  amount  of  space  available  in  this  particular  system,  any  change 
in  size  and  complexity  at  the  LRU  level  could  be  restrictive,  i.e.,  making  the  check 
valve  a  control  valve.  Also,  keeping  the  bleed  trip  off  functional  relationship  with  the 
PRSOV  requires  an  additional  control  line  from  the  Breg. 

For  an  original  design  for  future  aircraft  (737-600,700,800...)  change  1  is  a 
feasible  and  logical  design  to  address  the  unjustifiable  removal  problem,  but  a  “quick 
fix”  for  current  aircraft  it  is  not. 
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5.2.2  Change  2-Add  PRSOV  Closed  Sensor  Light 

Once  again,  the  methodology  suggestions  of  section  4.3  are  heeded  and  the 
PRSOV  is  targeted  once  more.  Using  an  existing  design  modification  based  on  the 
747-400  BAGS  design,  a  PRSOV  closed  sensor  light/indication  is  added  to  the  system 
to  arrest  the  unjustifiable  removals  of  at  least  that  particular  LRU.  Since  70  percent  of 
the  PRSOV  failure  modes  are  in  the  closed  position,  this  modification  promises 
significant  impact. 

Basically,  this  modification  entails  simply  adding  a  limit  switch  type  sensor  to 
give  the  aircraft  crew,  and  thus  troubleshooting  personnel,  an  indication  when  the  valve 
is  in  its  closed  position  (indication  6  for  analysis).  Thus,  if  an  indication  2  (bleed 
pressure  low)  occurs  without  an  indication  6  (PRSOV  closed)  then  a  PRSOV  failure 
can  be  discounted.  This  decrease  in  ambiguity  of  indication  2,  which  is  the  most 
ambiguous,  should  aid  in  overall  system  diagnosability. 

Based  on  the  benchmark,  MTBUR  of  the  PRSOV  increases  by  34  percent  and 
all  other  MTBURs  increase  slightly  as  well  with  the  exception  of  the  check  valve’s 
decreasing  slightly  because  of  the  system  metric  dynamics  (the  ambiguity  of  the  check 
valve’s  only  indication,  2,  mandates  an  increase  in  false  detections  of  low  failure  rate 
LRUs  with  a  decrease  in  number  high  failure  rate  candidates).  Overall  cost  savings  is 
approximately  7  1/2  percent. 

This  modification  exemplifies  the  age  old  battle  between  BITE  and  increased 
weight  and  complexity.  Modem  sensors  have  a  reliability  of  at  least  an  order  of 
magnitude  above  that  of  the  parent  system  and  weigh  as  little  as  a  dime,  yet  even  the 
slightest  increase  in  weight  and  complexity  can  substantially  increase  cost  in  terms  of 
fuel  and  assembly  hours— especially  for  aircraft  systems.  From  the  human  factors 
standpoint,  there  is  a  point  of  diminishing  returns  on  information  available  to 
crewmembers  in  the  form  of  indications,  but  since  this  indication  is  continuous  and  can 
be  recorded,  reaching  that  point  from  this  indication  is  doubtful. 

Since  so  many  system  variables  comprise  fuel  saving  strategies,  the  cost  benefit 
seems  to  be  in  favor  of  increased  weight  based  on  the  amount  of  savings  this  change 
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produces.  Even  in  this  particular  system,  there  is  always  enough  room  under  the 
cowling  for  “just  one  more  sensor”. 


5.2.3  Change  3— Add  Indication  3  to  PRSOV 

Targeting  the  PRSOV  once  again  since  it  appears  to  have  the  most  room  for 
improvement,  the  function-indication  relationship  is  modified  to  decrease  the  ambiguity 
of  indication  2  in  much  the  same  way  as  adding  a  sensor. 

Some  type  of  relationship  with  existing  indications  or  LRUs  and  the  PRSOV  is 
sought  after  because  of  the  high  failure  rate  of  the  PRSOV  in  the  closed  position. 
Considering  the  bleed  trip  off  light  illuminates  whenever  a  bleed  trip  occurs  and  a  bleed 
trip  closes  the  PRSOV  in  the  case  of  overheat  or  overpressure,  an  association  is  already 
in  place.  Merely  running  the  bleed  trip  off  light  (indicator  3)  wire  from  the  PRSOV 
closed  position  instead  of  the  overheat/overtemperature  probes  which  currently  signal 
the  indication  not  only  reduces  the  ambiguity  of  indication  2,  but  maintains  system 
integrity  by  changing  no  functions  and  adding  no  sensors.  This  modification  simply 
changes  the  PRSOV  failed  closed  indication  from  indication  2  to  indication  23. 

The  MTBUR  for  the  PRSOV  increases  by  29  percent  and  slightly  increases  for 
the  HSreg,  duct,  Breg,  HPSOV,  and  PCLR  primarily  due  to  the  decrease  in  ambiguity 
of  indication  2  which  these  LRUs  share.  All  other  LRU  MTBURs  decrease  slightly 
due  to  associations  with  both  indications  2  and  3  (except  for  the  check  valve  whose 
MTBUR  decreases  for  the  same  reason  stated  in  section  5.2.2)  which  the  PRSOV  is 
now  associated  with.  The  overall  cost  savings  for  this  modification  is  almost  6  1/2 
percent. 

This  modification  seems  very  feasible  due  mainly  to  its  simplicity.  According 
to  Boeing  publications  the  bleed  trip  off  light  is  incited  by  an  overpressure  (>  180  ±  10 
psi)  at  the  inlet  of  the  PRSOV  which  is  monitored  by  an  overpressure  switch  inside  the 
remotely  located  Breg.  The  indication  is  also  incited  by  an  overheat  (>490°±10°F)  out 
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of  the  precooler  which  is  monitored  by  an  overheat  switch  just  downstream  of  the 
precooler.  This  change  would  replace  two  wires  running  from  the  switches  with  one 
wire  running  only  from  the  PRSOV  to  the  bleed  switch  off  light.  A  drawback  would 
be  an  apparent  need  to  install  a  limit  switch  sensor  in  the  PRSOV  to  monitor  its 
position  and  relay  the  message  to  the  indication,  therefore  adding  a  sensor  Uke  change  2 
but  not  decreasing  the  ambiguity  as  much  as  a  separate  indication  might. 

Overall,  this  design  mentality  is  logical.  Scrutiny  reveals  that  complexity  is 
even  reduced  if  the  bleed  trip  off  light  signal  wires  are  removed  from  the  Breg 
overpressure  and  overtemperature  switches.  Of  course,  a  modification  like  this  may 
take  more  hours  of  overhaul  than  desired.  In  addition,  even  though  indication  2 
decreases  in  ambiguity,  indication  23  increases  in  ambiguity.  In  light  of  the  above 
discussion,  change  3  promises  to  be  a  sound  design. 


5.2.4  Change  4— Add  Indication  3  to  PRSOV  and  FAMV 

From  the  original  DEPCOST  analysis  it  appears  that  besides  the  PRSOV,  the 
FAMV  is  next  in  line  for  room  for  possible  improvement  based  on  the  suggestions  of 
section  4.3.  Since  the  FAMV  already  has  a  sometimes  relationship  with  indication  3, 
making  it  a  hard  failure  (always  relationship)  does  not  seem  out  of  the  question. 

From  a  mechanical  standpoint,  whenever  the  FAMV  fails  in  the  closed  position, 
the  PCLR  will  not  receive  any  cooling  air  from  the  engine  fan.  This  should  cause  an 
overheat  condition  an  overwhelming  majority  of  the  time.  A  wire  and  probably  a  limit 
switch  sensor  must  be  added  to  the  FAMV  to  incite  the  bleed  trip  off  light  whenever  a 
failure  occurs.  This  modification  is  applied  in  conjunction  with  the  modification  in  the 
previous  section  for  analysis  purposes. 

From  the  original  benchmark,  the  MTBUR  of  the  PRSOV  increases  by  22 
percent.  All  other  LRUs  are  affected  in  the  approximately  the  same  manner  and  same 
degree  as  the  previous  change.  Even,  the  MTBUR  of  the  FAMV  is  decreased  slightly. 
The  overall  cost  savings  is  almost  6  percent— less  than  that  of  change  3  alone. 
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The  faulty  logic  in  this  redesign  is  that  it  increases  the  failure  rate  of  an  already 
high  failure  rate  indication  (23)  at  least  as  much  as  it  decreases  the  failure  rate  of  an 
already  improved  indication  (2).  Thus  nullifying  any  gains  previously  made.  Also, 
even  though  the  FAMV  has  much  room  for  improvement,  it  does  not  have  much  room 
in  the  particular  failure  mode  targeted  (only  30  percent  of  all  failures  are  in  the  closed 
mode).  From  a  mechanical  standpoint,  the  same  arguments  apply  as  those  given 
against  modification  3,  but  twofold  since  another  sensor  must  be  added. 

Not  only  must  an  LRU  with  a  high  potential  for  improvement  be  targeted,  but 
the  particular  failure  mode  that  causes  most  of  its  failures  must  be  addressed. 
Modification  4  is  not  recommended. 


5.2.5  Chan2e  5--Add  PRSOV  Closed  &  FAMV  Open  Sensors 

The  lesson  learned  from  the  previous  section  is  applied  by  combining  change  2 
from  the  747  design  to  a  sensor  addition  on  the  FAMV.  The  open  position  of  the 
FAMV  valve  along  with  the  closed  position  of  the  PRSOV  is  targeted  by  adding  two 
sensors  to  the  system. 

In  addition  to  the  PRSOV  modification  discussed  in  section  5.2.2,  a  limit  switch 
sensor  must  be  added  to  monitor  the  failed  open  position  of  the  FAMV  which  accounts 
for  70  percent  of  its  failures.  These  two  sensors  decrease  the  ambiguity  of  two 
ambiguous  indications  (2  and  5)  while  increasing  the  diagnosability  of  the  two  highest 
cost  drivers. 

The  MTBURs  of  the  PRSOV,  PCLRsen,  and  FAMV  are  significantly  increased 
while  those  of  the  HSreg,  duct,  Breg,  and  HPSOV  are  increased  slightly.  The  PCLR 
and  check  MTBURs  are  decreased  slightly  due  to  their  increase  in  probability  of  false 
detection  which  influences  cost  little.  The  overall  cost  savings  is  over  10  percent. 

The  BITE  versus  weight  and  complexity  conflict  arises  again  for  this 
configuration.  The  cost  analysis  of  added  weight  is  not  included  in  this  research,  but  it 
is  doubtful  cost  would  encroach  upon  the  savings  realized  by  two  lightweight  sensors. 
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5.2.6  Change  6--Add  PRSOV  Closed  &  FAMV  Stuck  Sensors 

Iterating  the  previous  change  one  more  step  to  arrest  all  unjustifiable  removals 
of  the  FAMV,  a  “stuck”  sensor  added  in  lieu  of  a  stuck  open  sensor.  The  FAMV  is 
the  second  highest  cost  LRU  in  terms  of  replacements  and  definitely  a  cost  driver  in 
terms  of  diagnosabUity  so  this  modification  is  analyzed  with  optimism. 

Preferably,  a  stuck  sensor  would  be  no  more  complex  than  a  single  limit  switch. 
Since  the  LRU  in  question  consists  of  a  butterfly  valve,  a  sensor  placed  on  the  axis  of 
the  valve  could  monitor  any  movement,  or  lack  thereof.  No  additional  sense  lines 
would  be  necessary  from  the  previous  modification.  Worst  case,  two  limit  switches 
(open  and  closed)  would  be  required. 

The  analysis  shows  significant  increases  in  all  LRU  MTBURs  especially  the 
PRSOV  (34  percent)  and  FAMV  (25  percent).  The  overall  cost  savings  is  12  percent. 

By  virtually  eliminating  all  unjustifiable  removals  of  the  FAMV  (reducing 
MTBUR  to  MTBF  of  the  LRU),  a  relatively  simple  modification  realizes  almost  twice 
the  savings  as  the  747  design. 


5.2.7  Chan2e  7-Add  PRSOV  &  FAMV  Stuck  Sensors 

The  final  modification  of  this  analysis  iterates  the  previous  modification  one 
more  time  by  incorporating  a  “stuck”  sensor  of  both  the  PRSOV  and  FAMV.  This 
modification  essentially  eliminates  all  unjustifiable  removals  of  the  two  least 
diagnosable/highest  cost  drivers  in  the  pneumatic  system. 

Both  the  PRSOV  and  FAMV  incorporate  butterfly-type  valves  for  their 
operation  so  both  could  be  fitted  with  the  same  “stuck”  sensor  mentioned  in  section 
5.2.6.  Once  again,  complexity  is  not  increased  to  a  great  extent  and  added  weight  does 
not  seem  to  threaten  feasibility. 
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Based  on  the  benchmark  once  more,  all  LRU  MTBURs  realize  a  rather 
tremendous  increase:  PRSOV  65  percent;  PCLRsen  54  percent;  FAMV  25  percent;  and 
all  others  over  3  percent.  The  overall  cost  savings  is  over  16  percent. 

This  change  is  recommended  over  all  other  changes  due  to  its  simplicity  and 
ease  of  retrofitting  current  aircraft  designs.  Information  from  the  Boeing  company  and 
the  Federal  Aviation  Administration  (FAA)  implies  bigger  cost  savings  realized  on 
sensor-based  modifications  rather  than  complete  component  overhaul  do  to  certification 
practices.  Change  7  of  the  BAGS  MTBUR  based  research  analysis  possesses  the 
confident  expectation  of  most  cost-benefit  and  least  retrofit  time  loss.  A  summary  of 
modification  results  based  on  predicted  diagnosability  cost  is  shown  in  table  3. 


Original  design  cost  =  $85,715 

DESIGN 

COST 

%  SAVINGS 

Change  1 

$78,673 

8.2 

Change  2 

$79,316 

IS 

Change  3 

$80,187 

6.5 

Change  4 

$80,696 

5.9 

Change  5 

$77,032 

10.1 

Change  6 

$75,293 

12.2 

Change  7 

$71,715 

16.3 

Table  3.  Cost  analysis  of  modifications. 


5.3  Spares  Provisioning 


All  prior  cost  analyses  consider  only  the  cost  per  unit  LRU.  The  DEPCOST 
model  includes  a  spares  holding  cost  found  by  equation  21. 


= Mmixrq^icres  x  GoslPer^-elJit^^ 
where  i=(MARR-Inflation  Rate)/(1 +Inflation  Rate). 
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If  the  number  of  spares  is  found  using  a  Poisson  distribution  with  a  spares 
availability  of  95  percent,  a  change  in  LRU  MTBUR  is  likely  to  have  an  impact  on 
overall  diagnosability  cost. 

The  Boeing  Company’s  algorithm  for  computing  the  number  of  spares  is  based 
on  the  Poisson  expansion  of 

2 Ue-^  )  *  ( )|  /  /• !  >  fillrate{Q9S)  (22) 

where  e  is  the  natural  logarithmic  base,  r+ 1  is  the  number  of  required  spares  to  satisfy 
the  fin  rate,  and  N  is  found  from  equation  23. 

N  =  QPA  *  FlightHours  *  TurnDays  *  RR  /  365  (23) 

where  QPA  is  the  quantity  per  airplane,  FlightHours  is  the  fleet  size  multiplied  by  the 

average  flight  hours  per  airplane  in  one  year,  the  TurnDays  is  the  time  in  the  shop  (14 
days  for  electrical  components  and  30  days  for  mechanical  components),  and  RR  is  the 
removal  rate  which  is  the  inverse  of  MTBUR.  An  increase  in  MTBUR  should  decrease 
the  cost  of  the  system  since  it  is  inversely  proportional  to  the  number  of  spares,  and 
therefore  the  holding  cost. 

Incorporating  the  required  number  of  spares  for  the  system,  an  overall  system 
cost  comparison  can  be  made.  Table  4  presents  a  summary  of  the  modification  results 
to  include  the  cost  of  actual  spares  provisioning. 


Original  design  cost 

=  $122,258 

DESIGN 

COST 

%  SAVINGS 

Change  1 

$112,443 

8.0 

Change  2 

$113,085 

1.5 

Change  3 

$115,343 

5.7 

Change  4 

$115,852 

5.2 

Change  5 

$107,651 

12.0 

Change  6 

$105,912 

13.4 

Change  7 

$102,334 

16.3 

Table  4.  Cost  analysis  of  modifications  including  spares  provisioning. 
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Actual  spares  provisioning  reveals  less  savings  for  changes  3  and  4,  but  an 
increase  in  savings  for  changes  5  and  6.  The  majority  of  cost  savings  from  the 
decrease  in  the  number  of  required  spares  is  due  to  the  PRSOV  and  FAMV,  falling 
directly  in  line  with  the  redesign  methodology  of  section  4.3. 
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6.0  CONCLUSION 


The  growing  life  cycle  cost  dependency  of  quality  products  is  prompting  design 
engineers  to  meet  product  specifications  with  diagnosability  as  a  major  ingredient.  This 
research  has  addressed  diagnosability  analysis  for  mechanical  systems  quantitatively  by 
means  of  LRU-indication  relationships.  These  relationships,  along  with  structure 
which  is  defined  by  maintenance  time,  essentially  determine  the  diagnosability  of  a 
system.  As  system  LRU  functions  and  indications  are  modified,  diagnosability  also 
changes  based  on  the  reliability  of  each  LRU  and  the  ambiguity  of  each  indication. 

The  MTBUR  of  each  system  LRU  is  a  direct  measure  of  diagnosability.  A 
generic  metric  was  developed  to  predict  LRU  MTBURs  for  any  system  made  up  of 
several  LRUs  that  give  some  indication  of  failure.  The  MTBUR  of  a  particular  LRU  is 
directly  related  to  the  probability  of  detecting  that  particular  LRU  and  its  time  to  repair 
given  a  failure  indication  including  other  LRUs.  The  value  of  MTBUR  for  each  LRU 
can  be  compared  to  that  of  other  LRUs  to  determine  which  ones  present  a  diagnostic 
challenge.  System  changes  based  on  this  information  can  then  be  made  to  decrease  the 
cost  of  diagnosability. 

The  MTBUR  prediction  metric  was  applied  to  the  737  BAGS  to  determine 
system  improvements.  LRU  evaluation  presented  the  PRSOV  and  FAMV  as  primary 
candidates  for  diagnosability  improvement.  The  life  cycle  costing  mechanism, 
DEPCOST  model,  was  used  to  evaluate  system  cost  based  on  the  diagnosability 
parameters  of  unjustified  removals,  spares  cost,  and  maintenance  time.  Seven  design 
changes  were  suggested  and  analyzed  based  on  MTBUR,  cost,  and  feasibility.  These 
redesigns  modify  LRU  indications  by  optimizing  current  indications  or  by  adding 
sensors  to  strategic  LRUs.  Evaluations  of  the  redesigns  revealed  an  improvement  in 
diagnosability  directly  impacting  the  cost  of  the  system. 

Quality  through  diagnosability  cannot  be  neglected  in  today’s  marketplace. 
With  cost  as  the  common  metric  for  design  evaluation,  and  analysis  factors  contributing 
to  extensive  downtime  costs,  design  for  diagnosability  should  be  more  than  mere 
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happenstance  considered  after  the  product  is  launched.  The  relationships  of 
diagnosability  developed  here  can  be  directly  compared  with  other  common  design 
decision-making  variables  such  as  manufacturability  and  ease  of  assembly  in  the  arena 
of  life  cycle  costing.  The  direction  of  future  research  is  expected  to  address  the 
structure  of  designs  explicitly  in  terms  of  maintenance  hours.  This  will  especially 
enhance  prediction  techniques  of  systems  with  a  lack  of  historical  data. 
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Frequency-Based  Vibration  Troubleshooting  Checklist 

vibration 

Possible 

Frequency 

Cause 

Comments 

1  X  Rpm 

Imbalance 

Steady  phase  that  follows  the  transducer.  Can  be  caused  by  load  variation, 
material  buildup,  or  pump  cavitation. 

Misalignment 
or  bent  shaft 

High  axial  levels.  IBOdeg  phase  relation  at  the  shaft  ends.  Usually  accompanied 
by  high  2  x  rpm  frequency. 

Strain 

Caused  by  casing  or  foundation  distortion,  or  from  anached  struaures  (e.q..  piping). 

Looseness 

Direaional;  changes  with  transducer  location.  Usually  accompanied  by  high 
harmonic  content  and  random  phase. 

Resonance 

Caused  by  attached  struaures:  drops  off  sharply  with  change  of  speed. 

Hlearica! 

Broken  rotor  bar  in  induction  motor.  Often  accompanied  by  sidebands  of 

2  X  motor  slip  frequency. 

2  X  rpm 

Misalignment 
or  bent  shaft 

High  levels  of  axial  vibration. 

Harmonic 

. 

Looseness 

Large  number  of  harmonia:  impulsive  or  truncated  time  waveform 

Rubbing 

Shaft  contaaing  machine  housing. 

Sub-rpm 

Oil  whirl 

Unstable  phase;  typically  0.43  to  0.4B  of  rpm. 

Bearings 

Fundamental  Train  = -2 —  X  j ,  _  B^IIOiamew  ^  cOS  contaa  angle] 

2  60  Pitch  Diameter  ^  ‘ 

N  X  rpm 

Rolling 

clement 

bearings 

,  ^^Balls  RPM  Ball  Diameter  ,  .  , 

Inner  race  =  n  +  ^  COS  contaa  angle] 

2  60  Pitch  Diameter  ^  ^ 

^  ^  #Balls  RPM  Ball  Diameter  _  ,  , 

CXrterrace=  ^  X  Pitch  Diameter  ""9'“' 

„  j  ^  ^  Pitch  Diameter  _  RPM  ,,  ,  Ball  Diameter  _ _  .  .  . 

Ball  defect  ^  x  (1  -  x  (COS  contact  angle)')] 

Usually  modulated  by  running  speed. 

1 

Gears 

Gcarmesh  (Iteeth  x  RPM);  usually  modulated  by  running  speed. 

j 

Belts 

Belt  X  running  speed  and  2  x  running  speed. 

Blades^/arrcs 

^lades/vanes  x  rpm;  usually  present  in  a  normally-running  machine. 

Harmonia  indicate  that  a  problem  exists. 

Resonance 

A  number  of  possible  sources,  irxduding  shaft  casing,  foundation,  and  attached 
structures.  Frequency  is  proportional  to  stiffness  and  inversely  propoaional 
to  mass.  Run-up  tests  and  modal  analysis  are  useful  in  diagnosis. 

from  fnrtrrUI  DU  ErvjirHwirig  CorpJ 

Figure  A2.  FFT  troubleshooting  checklist 
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MTBUR  Calculations 


LRU 

fail  rate 

MTBF 

PC 

LLHPR 

SLHPR 

hpsov 

13.882 

72035.73 

0.0013882 

4,5 

4.64 

737-300,400,500 

prsov 

89.135 

11218.94 

0.0089135 

3.05 

4.64 

pclr 

8.804 

113584.7 

0.0008804 

4 

10 

prsov 

duct 

45.455 

21999.78 

0.0045455 

4 

2 

i=  89.135 

famv 

29.578 

33808.91 

0.0029578 

7.66 

8.92 

check 

1,34 

746268.7 

0.000134 

4 

1.8 

HSreg 

37.67 

26546,32 

0.003767 

3.13 

5.38 

PCLRsensor 

16.805 

59506.1 

0.0016805 

2.24 

1.53 

Breg 

43.077 

23214.24 

0.0043077 

9.94 

5.38 

Thermo  - 

9.058 

110399.6 

0.0009058 

4.15 

1.39 

Indication 

candidates 

sumFRs  p«find 

failrateind 

VofcandkMts 

Ci 

PCi^PRinomMfi  1/fajlra(«in<n«6 

PDi  MTBFind 

sum  FRtvFRi|ind  1/FRrw 

failraten-i  MTBFn-i 

MTBFfvWDi 

MTBURi-u 

1«TBURiun‘U6 

failratei-u 

1 

h.pr.H.B 

4 

■■giif«iP].-KEMKiciaEEmEjcMW>jaiirafcTa 

13 

h.H 

4.4611 

2 

224160 

0 

0 

0 

2 

h.pr.pc.d.f.c.H.P.B.T 

151.756 

10 

0.418728  6589.525 

89.3615  11190.5 

26725.009 

37,41813 

23 

d.f.P 

4.5919 

3 

217774.8 

0 

0 

0 

24 

pc,d 

5,8661 

2 

170471 

0 

0 

0 

245 

pc,d 

2.24405 

2 

445622.9 

0 

0 

0 

25 

pc,d 

1.3493 

2 

741125 

0 

0 

0 

3 

d.T 

9,0613 

2 

110359.4 

0 

0 

0 

4 

pc,d 

2.71295 

2 

368602.4 

0 

0 

0 

5 

d.f.P 

33.83175 

3 

29558.03 

0 

0 

0 

LRU 

indication 

%  of  FR 

failrateperind 

LRU 

indication 

%  of  FR 

failrateperind 

hpsov 

1 

25 

3.4705 

famv 

2 

25 

7.3945 

13 

5 

0.6941 

23 

5 

1.4789 

2 

70 

9.7174 

5 

70 

20.7046 

prsov 

1 

30 

26.7405 

check 

2 

100 

1.34 

2 

70 

62.3945 

HSreg 

1 

45 

16.9515 

pclr 

2 

65 

5.7226 

13 

10 

3.767 

24 

15 

1.3206 

2 

35 

13.1845 

245 

10 

0.8804 

PCLRsen 

2 

25 

4.20125 

25 

5 

0.4402 

23 

5 

0.84025 

4 

5 

0.4402 

5 

70 

11.7635 

duct 

2 

70 

31.8185 

Breg 

1 

55 

23.69235 

23 

5 

2.27275 

2 

35 

15.07695 

24 

10 

4.5455 

Thermo 

2 

10 

0.9058 

245 

3 

1.36365 

3 

90 

8.1522 

25 

2 

0.9091 

3 

2 

0.9091 

4 

5 

2.27275 

5 

3 

1.36365 

Totals 

sum  FRn-i  column  l/Tot  FRn-i ‘US  sum  FRiu  column  1/FRkj ’US  MTBFi 

Tot  Failrate  n-i  Tot  MTBF  n-i _ failratei-u  MTBUR  i-u  MTBUR  i-i 

133.47585  7491.9919970541  58.16048  17193.807  11218.94 


1/(1/MTBURun+1/MTBURj) 

Predicted  MTBUR  i  Historical  MTBUR  i 

6789.0747  5394 


Figure  Bl.  Sample  Quattro  Pro  MTBUR  calculation  spreadsheet 


APPENDIX  C 


DEPCOST  model  spreadsheet 


74 


3 

CQ 

I- 

S 


100000 

90000 

80000 

70000 

60000 

50000 

40000 

30000 

20000 

10000 

0 


$70,000 

$60,000 

$50,000 

$40,000 

$30,000 

$20,000 

$10,000 

$0 


> 

o> 

"o 

o> 

> 

> 

a: 

o 

3 

a> 

O 

O 

-j 

CO 

CO 

*0 

m 

W  »_ 

< 

CO 

o 

a: 

X 

a:  o 

u. 
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CL 

Q_ 
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Cl 


total  cost 
$78,673 


KSfiMMTBUR 


—•—GRAND 

TOTAL 

COST 


LRU 


LRU 

fail  rate 

MTBF 

PC 

LLHPR 

SLHPR 

hpsov 

13.882 

72035.73 

0.0013882 

4.5 

4.64 

737-300,400,500 

prsov 

89.135 

11218.94 

0.0089135 

3.05  * 

4.64 

pcir 

8.804 

113584.7 

0.0008804 

4 

10 

prsov 

duct 

45.455 

21999.78 

0.0045455 

4 

2 

i= 

89.135 

famv 

29.578 

33808.91 

0.0029578 

7.66 

8.92 

check 

13.882 

72035.73 

0.0013882 

4 

1.8 

HSreg 

37.67 

26546.32 

0.003767 

3.13 

5.38 

PCLR  sensor 

16.805 

59506.1 

0.0016805 

2.24 

1.53 

Breg 

43.077 

23214.24 

0.0043077 

9.94 

5.38 

Thermo 

9.058 

110399.6 

0.0009058 

4.15 

1.39 

iumFRsp«i  ind 

of  candidates 

PCiA-PRi  nofmai 

l/Tailiatetnd*  1e6 

sum  FRtvFRi^nd 

I^Rfw  WTBFtvUPOt 

Indication 

candidates 

failrateind 

Ci 

PDi 

MTBFind 

failraten-i 

MTBFn-i  MTBURi-u 

failratei-un 

1 

h.H.B 

41.9605 

3 

23831  94 

0  0 

0 

13 

h.H.B 

6.61495 

3 

151172.7 

0  0 

0 

2 

h^c.d.f.H.P.E.T 

88.0215 

8 

11360.86 

0  0 

0 

23 

d.f.P 

4.5919 

3 

217774.8 

0  0 

0 

24 

pc.d 

5.8661 

2 

170471 

0  0 

0 

245 

pc.d 

2.24405 

2 

445622.9 

0  0 

0 

25 

pc,d.pf.c 

64.4379 

4 

0.964042  15518.82 

2.0434  489380,4  507633.74 

1.969924 

3 

d.T.pf 

35.8018 

3 

0.68178  27931.56 

9.0613 

110359.4  161869.69 

6.177809 

4 

pc,d 

2.71295 

2 

368602.4 

0  0 

0 

5 

d.f.P 

33.83175 

3 

29558.03 

0  0 

0 

LRU 

indication 

%of  FR 

failrateperind 

LRU  • 

indication 

%  of  FR 

failrateperind 

hpsov 

1 

25 

3.4705 

famv 

2 

25 

7.3945 

13 

5 

0.6941 

23 

5 

1.4789 

2 

70 

9.7174 

5 

70 

20.7046 

prsov 

3 

30 

26.7405 

check 

25 

5 

0.6941 

25 

70 

62.3945 

HSreg 

1 

45 

16.9515 

pcIr 

2 

65 

5.7226 

13 

10 

3.767 

24 

15 

1.3206 

2 

35 

13.1845 

245 

10 

0.8804 

PCLRsen 

2 

25 

4.20125 

25 

5 

0.4402 

23 

5 

0.84025 

4 

5 

0.4402 

5 

70 

11.7635 

duct 

2 

70 

31.8185 

Breg 

1 

50 

21.5385 

23 

5 

2.27275 

13 

5 

2.15385 

24 

10 

4.5455 

2 

35 

15.07695 

245 

3 

1.36365 

Thermo 

2 

10 

0.9058 

25 

2 

0.9091 

3 

90 

8.1522 

3 

2 

0.9091 

4 

5 

2.27275 

5 

3 

1.36365 

Totals 

sum  FRn-i  cotumn 

suTi  PRiu  cotumn 

1tFRiu*l«6  MTBF. 

Tot  Failrate  n-i 

Tot  MTBF  n-i 

failratei-un  MTBUR  i-un  MTBUR  j-i 

11.1047 

90051.959950909 

8.147733 

122733.524  11218.94 

V(1/MTBURun*  l/KfTBURj) 

Predicted  MTBUR  i  Historical  MTBUR  i 

10279.3165  5394 

Figure  C2.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  1 


75 


o: 

3 

CQ 


CL  OX 

D. 
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PC  LLHPR  SLHPR 
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72035.73  0.0013882  4.6 

4.64 

737-300.400,500 

prsov 

89.135 
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21999.78  0.0045455  4 

2 

i=  89.135 

famv 

29.578 

33808.91 
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ch2 
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1 

h.pf,H.B 

70.85485  4  0.470i95  14113.4 

44.1144  22Qi 

58.4  46210.56 

^.74234 
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h.H 

4.4611 

2 

224160 

0 

0 

0 

2 
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89.3615 

9 
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0 

0 

0 
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d.f.P 

4.5919 

3 
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0 

0 

0 
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5.8661 
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0 
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0 
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33.83175  3 

29558 

0 

0 

0 

26 

pf 

62.3945 

1 

1  16027.1 

0  0 

0 

0 

LRU 

indication 

%of  FR 

failrateperind 

LRU 

indication  %  of  FR  failrateperind 

hpsov 

1 

25 

3.4705 

famv 

2 

25  7.3945 

136 

5 

0.6941 

236 

5  1.4789 

2 

70 

9.7174 

5 

70  20.7046 

prsov 

1 

30 

26.7405 

check 

2 

100  1.34 

26 

70 

62.3945 

HSreg 

1 

45  16.9515 

pclr 

2 

65 

5.7226 

136 

10  3.767 

24 

15 

1.3206 

2 

35  13.1845 

245 

10 

0.8804 

PCLRsen 

2 

25  4.20125 

25 

5 

0.4402 

236 

5  0.84025 

4 

5 

0.4402 

5 

70  11.7635 

duct 

2 

70 

31.8185 

Breg 

1 

55  23.69235 

236 

5 

2.27275 

2 

35  15.07695 

24 

10 

4.5455 

Thermo 

2 

10  0.9058 

245 

3 

1.36365 

36 

90  8.1522 

25  2 

0.9091 

36  2 

0.9091 

4  5 

2.27275 

5  3 

Totals 

1.36365 

sum  PRivi  cotumn  1/ToC  "Rr.,  • 

Si/nFPiu  column 

1(FRiu*1«6  MTBPt 

Tot  Failrate  n-i  Tot  MTBF  n-i 

failratei-un  MTBUR  i-un  MTBUR  j-i 

44.11435  22668  360748827  20.74234  48210.564  11218.94 

t/t1/MTBLRun*1/MTBUR|) 

Predicted  MTBUR  j  Historical  MTBUR  i 

9101.0574  5394 

Figure  C3.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  2 
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Figure  C4.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  3 
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Figure  C5.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  4 
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Figure  C6.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  5 
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Figure  C7.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  6 
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Figure  C8.  Spreadsheet  calculation  and  DEPCOST  illustration  for  change  7 


